Video Streaming App Proof of Concept

The backstory

Mobile applications with video streaming have become a valid alternative to offline meetings. In addition, social media applications, doctor-on-demand projects, and virtual events software also apply video streaming functionality. 

At The APP Solutions, we often receive requests from potential clients for video streaming app development. Clients, who consider The APP Solutions as a tech partner, are looking for evidence that we have expertise in video streaming and social media app development. 

We decided to build a proof of concept of a video streaming application to ensure the ownership of the tech expertise. We also gave a detailed description of the overall architecture, feature list, and technologies we applied. 

What is a Proof of Concept (POC)? 

A proof of concept (POC) is an early product version between the design and main development phases. POC has become a common way for startups and established businesses to test whether the idea they have is actually going to work since POC demonstrates that the project can be done. A proof of concept also creates a starting point for the development of the project as a whole.

Businesses need a proof of concept when there is no guarantee that the technical result is achievable due to the leverage of complex architecture and new technologies, what’s relevant to video streaming mobile applications. Thus, by developing a POC, developers, and stakeholders get evidence that the project is viable.

Our challenge

Develop the proof of concept of a video streaming application with the basic functionality of a social media application. To achieve this goal, we needed to: 

Implement the following mobile screens and use cases: 

  • Sign up / Sign in. Users can sign up/sign in to the system.
  • View profile. Users can view their and other users’ profile data. 
  • Edit profile. Users can edit their profile data, such as name, avatar, and bio. 
  • Search. Users can search for other users by name and follow them. 
  • Start streaming.  Users can start real-time video streaming.
  • View streamings list. Users can view the list of active streams. 
  • Join the stream. Users can participate in the streaming of another user as a viewer. 

Integrate several authorization methods, such as: 

  • Email and Password
  • Google authorization
  • Facebook authorization
  • Apple authorization

Our Solution – Video Streaming App Proof of Concept  

We developed a proof of concept of a video streaming application with the basic functionality of a social media app to show off our tech expertise in live broadcasting and demonstrate how such a project may look. 

Implemented features:

  • Sign-in/Sign-up via email and password, Facebook, Google, and Apple ID.  
video streaming app proof of concept
  • User Profile 
video streaming app poc development
  • Search for followers, follow and unfollow functionality 
video streaming app poc development
  • View the list of active video streams 
poc video streaming social media network
  • Broadcasting videos to subscribers and receiving reactions 
social media app with video streaming functionality poc development

High-level Architecture vision 

video streaming app high level architecture vision

Tech stack

  • Swift for iOS application
  • Firebase Real-time BD supports direct connectivity from mobile and web platforms as well as backend applications
  • Firebase for user authentication and authorization, data and image storing
  • Google Cloud Platform for hosting app’s back-end 
  • Python for application’s back-end 
  • Agora.IO, a SaaS for video broadcasting and participating in video streaming

Contributors

The repository with the POC code is available on the link.

How we developed a Streaming app proof of concept

Core

We built the app’s POC using MVP+Router+Configurator architecture, including MVVM+Combine for lists, etc. 

We made DI using ServiceLocator Singleton, which is a factory of abstract services.

Main services 

  • Keychain for saving JWT and Apple sign-in credentials.
  • Network, AuthorizedNetwork, TokenProvider, APIErrorParser for executing network requests. All requests have to conform to APIRequestProtocol or APIAuthorizedRequestProtocol for requests that include a token into headers. 
  • TokenProvider for fetching a token from the keychain and refreshing it via Firebase if needed. If your app has to refresh a token using a backend request, go to Core/Networking/TokenProvider and rewrite this service to restore the token manually. 
  • FirebaseManager for authentication using email+password, verification ID, social media, password reset, logout, etc.
  • FirebaseDatabaseManager for obtaining followers list, fetching users, etc.
  • FirebaseStorage for setting and fetching an avatar.
  • AuthService is just a stub for validating Firebase JWT tokens. If your back-end requires JWT verification, insert a validation request into the validate method.
  • SearchService for fetching users with input from the search field.
  • FollowService for following/unfollowing the user fetched with SearchService.
  • UserService for updating user profile (name etc.).
  • StreamService for fetching a token to join agora channel, notifying back-end about start/end of the channel, subscribing to user reactions, sending reactions, etc.

Our results 

The development of the video streaming app proof of concept gave us the following expertise:

  • We integrated video streaming functionality to the POC using Agora.IO SaaS.
  • We implemented the authentication and authorization by Firebase Authentication.
  • We worked with Firebase Realtime Database, which supports direct connectivity with end-users applications (like mobile, web, etc.) server-side applications.
  • We optimized the development process by applying ready-to-use Firebase functionality.

As a result, we showcased our expertise in video streaming app development. 

HYPR – exclusive luxury car ride experience app

The emergence of Uber and Lyft in the late 2000s turned the peer-to-peer ridesharing business and its service framework on its head. 

Unlike their more formalized taxi competition, they embraced the social networking element of ridesharing and fundamentally transformed the customer experience. 

Traditional taxi services applied straightforward transportation from point A to point B. In the case of Uber and Lyft, the service is about the experience, coupled with the act of conveyance. This particular feature allowed Uber and Lyft to gain a significant competitive advantage and become substantial players in the peer-to-peer ridesharing industry.

In addition to that, focus on experience broke new grounds in terms of developing different ridesharing niches for different types of audiences. 

Our company had a chance to work on such a peer-to-peer ridesharing experience service. In this article, we are going to tell you about it. 

HYPR project description

HYPR is an on-demand ridesharing service specialized in luxury vehicle riding experiences. In other words, it is an application that allows customers to take a ride to their destination in style and have a good time riding in an exclusive supercar. 

How does HYPR make a difference in this niche? 

  • Exclusive vehicle riding (especially supercar rides) is a close-knit activity. It is tough to get into unless you are already a part of this community by proxy.
  • In addition to this, luxury car renting services tend to be overpriced and riddled with document checking and insurances due to a relative lack of competition.  
  • On the other hand, HYPR makes the thing available for subscribers without the overcomplicated proceedings of renting services. 

Here’s how:

  • HYPR takes an Uber type of peer-to-peer ridesharing service model and implements it into a completely different use case. 
  • Car renting services focus on the vehicle itself. The riding experience needs to be figured out by the customer on his own.
  • Uber is ultimately about getting to the destination (bundled with a quality service).
  • On the other hand, HYPR is about the journey in a specific type of vehicle. 

There is a shift from strictly transportation services towards experiential commerce.

Because of that, there is a stronger emphasis on the social networking element. 

  • Users get a newsfeed with a selection of current ride opportunities. 
  • On the other side, drivers manage their bookings and announce ride opportunities. 
  • This activity creates an engagement loop in which the user’s exchange of experiences encourages further use of the service.

Project technical specification

The core functionality of the HYPR app is a variation of the taxi app concept. 

There is a mobile application for customers. Its features include: 

  • Journey specification and vehicle selection;
  • Event newsfeed;
  • The real-time monitoring of vehicle movement when it is nearing the customer.

The other elements of the service include:

  • Subscription website – to submit registration forms and subsequently, for subscription management; 
  • Admin Dashboard contains the following features: 
  • General application activity overview; 
  • In-app analytics;
  • Driver dashboard with activity status and booking management features.
  • Vehicle manage and ride prices; 
  • Form applicants and customer management;
  • Customer support operations; 
  • Damage reports and statistics review (vehicle demand, ride stats, etc.)
  • Driver’s app includes:
  • Activity Dashboard;
  • Booking list – with current, past and upcoming bookings;
  • User Profile.

Key Solutions

Payment Processor integration

Transparent payment proceeding is one of the requirements for efficient customer service. HYPR application required a reliable and accessible payment processor to handle: 

  • Subscription fee management;
  • Ride payments.

After a thorough examination of the available options, the most suitable choice for the application was Stripe. It was the best gateway in terms of fitting the requirements: 

  • Easе of use;
  • Merchant account features;
  • Multiple payment methods;
  • PCI DDS compliance + AVS, SSL, CCV features;
  • Flexible API.

Google Maps / Google Places Integration

The other significant component that needed third-party integration was geolocation

  • The application needed a solution for general map navigation and vehicle monitoring. 
  • In addition to that, the map needed as much additional relevant information regarding different locations as possible.

We’ve used: 

  • Google Maps for general web mapping and vehicle movement monitoring;
  • Google Places to streamline the navigation and provide additional information regarding different locations;

Firebase integration

You can’t go far without proper data analytics. Understanding the state of things regarding the application, the way customers are using it, and how efficiently it operates, hangs on thorough analytics. 

The app needed a practical and accessible mobile analytics solution to gain insights regarding application use, and also, user engagement.

We used Firebase because of its ease of use and flexibility. With its help, the company can see what is going on in the application and react appropriately.

Handling data security

Data security is one of the significant challenges that come with the development of any application that deals with sensitive data.

The main requirement was GDPR compliance, with its strict guidelines for user data management, transparent data use, and guaranteeing the safety of data. 

The following solutions were used to provide appropriate data security measures:

  • HTTPS, TLS, SSH for data-in-transit encryption; 
  • bcrypt for database encryption; 
  • DDoS protection ;
  • PCI DSS compliance;
  • Activity Logging + Access Management;
  • Limited access to the production database at the network level.

Transport for London compliance 

At the point of application launch, HYPR operation is based in London. In order to get a license to operate in London, the application should be compliant with Transport of London requirements.

In essence, TfL compliance means that the application should provide the following information upon request: 

  • Lost/Found property
  • Compliant/Compliment form
  • Vehicle specifications
  • Bookings record
  • Private hire driver agreement

In order to enable the gathering of this data conveniently, we augmented the database with a couple of metatags that bring together all the required information with a click or two.

Tech Stack

  • Subscription website – JavaScript/Php 
  • Consumer App- iOS 
  • Admin Dashboard- Sonata 
  • Stripe Payment Processor
  • Google Maps + Google Places
  • Google Analytics for Firebase

Personnel

  • Project Manager
  • Business Analyst
  • 2 QA engineers
  • Front-end developer
  • 2 Back-end developers
  • 2 iOS developers

Conclusion

HYPR application showcases the potential of transportation-related applications combined with niche fields. 

  • The application streamlined the luxury ride experience service to a couple of clicks. The app provides room for further elaboration of the ridesharing service framework.
  • On the other hand, the application’s use of social networking greatly expands its opportunities in user engagement.

For our company, it was a great experience in applying a familiar framework in a different configuration. We developed a solution that is easy for both users and admins. 

During the development of this project, we utilized a streamlined, agile workflow. This approach helped us to deploy an operating prototype of the system ahead of the planned date and dedicate more time to its testing and refinement. 

Personalized travel recommendation chatbot: the Case Study

 Our journey

Our client is a Polish traveling marketplace, where travelers can get in touch with different travel agents from partnering agencies to select the best trip. The primary sales channel of the client was their website. But, with the recent trend of communicating with businesses via social networks, the company decided to use Facebook Messenger chatbot as an additional sales channel. Apart from providing a more personalized booking experience for travelers, Facebook Messenger chatbot also become an additional channel for monetization since the company offers the chatbot integration to partners who want to stand out from others on the list. 

The client hired us since they were inspired by AI Versus, a chatbot project we participated in. 

Project key facts

Location



Project goals 



Team composition



The project timeframe 



Obstacles




Poland  

Develop a Facebook Messenger chatbot for personalized trip search. 

  • 2 back-end developers 
  • 1 front- end developer 
  • 1 designer
  • 2 QA
  • 2 dev ops 

2,5 month 

Strict control from Facebook developers who tested and reviewed the bot for 3 weeks.


Setting Goals

To achieve the client’ business goals, we outlined the main requirements for the chatbot MVP:

  • Integrate the chatbot with the client’s databases of partner agencies
  • Create a chatbot flow to help travelers to search for trips
  • Integrate search filters that will include the following parameters:

– Dates 

– Number of adults 

– Number of Infants

– Meal 

– Hotel stars

  • Allow the chatbot to switch to a human agent

How we did it 

Step 1. Development of an authorization page for agencies

  • We created the authorization page design and integrated the default Facebook authorization functionality for partnering agencies.
  • We built a chatbot admin panel using Sonata and a guide for the client on how to use the admin panel. 

Step 2. Development of chatbot flow

  • The main goal was to empower the chatbot with scripted answers for all possible questions since the MVP scope did not include open questions.
  • Each change in the chatbot was carefully reviewed and tested by the Facebook development team since the social network’s main concerns were user security and privacy. 

Step 3. Integration with Data Service

  • We integrated the chatbot with the client’s API Data Server to show relevant trip search results presented in the clients’ databases. 
  • The Facebook team requested the client company registration documents and a chatbot MVP demo. 
  • Then, the Facebook team gave us a list of changes to implement before the chatbot launch.  

Step 4.  Cloud server integration

  • Our DevOps specialist created scripts for the project structure deployment to the client’s servers. 
  • The clients could not estimate the workload to the server, which impacts the workload testing criteria and the client infrastructure for the project deployment, therefore, the server cost. Thus, we decided to use Amazon Web Service as a cloud hosting solution. 
  • Finally, our DevOps deployed the app to the AWS client’s account.

Technical details 

  • Facebook Messenger as a chatbot development platform 
  • Amazon Web Service as a cloud hosting solution 
  • MY SQL for chatbot localization  
  • Sonata for chatbot admin panel 
  • IBM Node-red powered by Node.JS as an orchestrator for microservices and bot flow logic 
  • Databases from the customer’s side

The future 

Currently, the travel chatbot MVP is in the final stage of testing and soon we will launch it on the client’s Facebook Page. With time, the client expects to integrate the following business logic:

  • More detailed search filters
  • In-chat payment
  • Scheduling the call with a travel agent.

How Chatbot Can Make an Efficient Patient Support System

 Healthcare is one of those industries that embrace cutting-edge technologies and makes the most of them. The reason for this is simple – new technologies that help save people’s lives and increase the quality of life. 

The adoption of machine learning and natural language processing algorithms throughout the healthcare industry has helped to streamline and vastly improve workflows of different medical procedures, and as a result, has made them more effective for their causes. 

Some of the most prominent examples of such streamlining and improvements are patient support systems. Let’s explain why. 

What’s wrong with patient support?

The word that best describes the state of patient support in the healthcare industry is “overwhelming”. Unlike other fields of the healthcare industry, where the root of the problem lies in the methodology, in the case of patient support it is the scope of the operation. In other words, too much demand and too little supply. 

Just like regular customer support everywhere else, the primary issues are:

  • Workflow efficiency. Because of limited resources, there is a tendency towards bottlenecks in the support pipeline. This prolongs the processing of the request and subsequently stretches out the reply time for a single request. As a result, the request processing pipeline is severely undercut. 
  • Workforce turnaround. Due to the high workload and punishing schedules, support operators often burn out and quit. 
  • Availability of the service. It is hard to maintain a fully-fledged 24/7 support service, beyond simple Q&A automation, with a limited workforce. 
  • Bringing onboard new employees takes time.
  • Operational costs. In addition to employee salaries, there are infrastructural maintenance costs. 

In one way or another, these issues are solved with process automation and adoption of Chatbot and Natural Language Processing. 

NLP Chatbot creates a win-win situation, both for healthcare service providers, and patients.

  • Companies are able to optimize the workflow;
  • Chatbot reduces the workload of the human operators while making the service available 24/7.
  • Patients get a much more engaging and efficient service.

Here’s how:

  • Conversational UI chatbots take over the majority of routine conversations, such as results notification and Q&A. Human operators are involved only in special cases;
  • Natural Language Processing provides a deeper dive into the intent and sentiment of the user’s requests; 
  • This information gives ground for process automation that increases the speed of delivery up to 40%;
  • The implementation of the conversational interface chatbot lowers the operational costs up to 30%. 

Our company was approached to develop such a solution and implement it into the existing system infrastructure. Let’s look at how The App Solutions developed a Chatbot solution for Healthcare patient support. 

How Chatbot can create a more efficient patient support system?

The client had a patient support system that handled a wide scope of patient requests such as: 

  • Providing various notifications – like test results, examination registering, etc;
  • Solving emerging issues – for example, retrieving lost passwords or explaining how to use different features of the service;
  • Gathering user feedback on the support service itself, and other services of the company. 

The entire operation was handled by trained human operators who worked under a strictly regulated set of guidelines. 

And while the workflow was fine-tuned, it wasn’t sufficient enough for the scope of the operation. With over a million active users, the patient support system resources were stretched too thin. 

In addition to this, there were concerns regarding the use of sensitive information and the possibility of compromising the integrity of the users’ accounts.

Because of this, it was decided to completely overhaul the patient support system with cutting edge technologies. 

Our task on the project can be described as follows: to develop a reliable solution that would: 

  • Streamline the workflow of the customer support operation;
  • Keep sensitive information safe.

The key requirements were to:

  • Implement a chatbot and enable 24/7 support;
  • Implement process automation for basic conversations and actions;
  • Increase the request resolution time;
  • Deploy the system on the cloud platform and make it more scalable for large scale data processing;
  • Keep the system fully compliant with the current privacy regulations.

Here’s how it went down:

Training a language model

Process automation and Chatbot interface require a pitch-perfect understanding of the request intent and subsequent trigger of the course of action. The former part is handled by the NLP language model. 

We have used a combination of Word2Vec and Doc2Vec to train the model and optimize the generative algorithm. 

Example of Doc2Vec mechanism

[Example of Doc2Vec mechanism]

Example of Word2Vec mechanism

[Example of Word2Vec mechanism]

Due to the specifics of the healthcare topic, the use of the open-source datasets is somewhat limited. They can be used to provide a groundwork for the model, but further training and optimization requires more peculiar data taken directly from the system.  

In order to train a language model on the best fitting dataset – we compiled it ourselves from patient support conversations. We used unsupervised machine learning algorithms to explore patient support data and then applied supervised machine learning algorithms to shape it into a training dataset.

Optimizing chatbot

The chatbot was the main system component. With a language model intact – our goal was to construct the interface around it. 

The model was developed with Python NLTK and Chatterbot library. After that, it was migrated to the web interface with Flask API.

We implemented a couple of machine learning algorithms to determine the intent of the request and connected it with relevant actions. For example, if the patient asks about doctor working hours – the bot accesses the doctor’s calendar and provides the relevant information. 

The main challenge at this stage was making the interface accessible. Since the level of technical literacy of the users may vary – we needed to make the whole thing as simple to use as possible. 

In order to do this, we applied extensive A/B testing of the functional elements. This allowed us to streamline the interface design and also optimize the conversation design.

Implementing process automation with machine learning

After developing a working language model and constructing a conversational UI chatbot around it, our next step was to prepare the process automation routines that would be activated from the chatbot interface. 

In order to do that, we broke the task into three categories:

  • Service support automation – the ones related to the services themselves (such as booking an examination or requesting test results).
  • Maintenance automation – related to system support and general information (for example, how to retrieve a lost password or to proceed with checkout)
  • Switch to human operator scenario – for complicated or emergency cases

We identified keywords and intentions for action triggers with TF-IDF. 

In order to broaden the scope of the model, we combined them with a wide selection of phrase variations so that the routine would be activated through casually formulated input queries. 

Cloud Deployment 

In order to secure consistent system performance, we deployed the entire project into the cloud platform. 

In this way, the patient support chatbot can maintain a high turnaround of information in the system, and a large volume of different operations, without slowing down or experiencing technical issues. 

Google Cloud Platform autoscaling features provided a solid backbone for every operation in the system and neutralized possible emergence of the scalability issues. 

Implementing Data Security solution 

Privacy and confidentiality are amongst the central concepts of healthcare services. In the case of patient support systems, this is one of the most important elements. Not only do you need to guarantee the security of data in general, but you also need to guarantee that the whole interaction between the service and the patient is absolutely confidential.

The whole data processing operation must be compliant with the Personal Information Protection and Electronic Documents Act (PIPEDA). 

In order to maintain PIPEDA compliance, we implemented the following solutions:

  • Provided a detailed description of how user personal data is used on the service;
  • Expanded a consent agreement for data processing upon registration;
  • Limited retention of data from deleted accounts to 30 days after termination.
  • Implemented Transport Layer Security (TLS) with a 256-bit Advanced Encryption Standard for data transit.

Tech Stack

  • Google Cloud Platform
  • NLTK for Python
  • Chatterbot library
  • Flask API
  • Word2Vec / Doc2Vec

Conclusion

The project reinvigorated the company’s patient support. 

  • The implementation of the chatbot interface cut operational costs in half. 
  • The NLP component increased the efficiency and availability of the service. As a result, the response time period was decreased by a third. 
  • The process automation allowed to streamline the service’s workflow and minimize the role of human operators in the handling of sensitive data. 

On the other hand, this project was a huge accomplishment for our team. We developed a complex solution that managed to bring the whole patient support system to a new level with a much higher efficiency rate. 

During the development of this project, we utilized more streamlined workflows that allowed us to make the whole turnaround much faster. Because of this, we managed to deploy an operating prototype of the system ahead of the planned date and dedicated more time to its testing and refinement. 

Have a project in mind?

Write to us

Clear Project: Real-time Data Analytics & Content Moderation

Project Background 

The client had an online chat platform. The platform was active for quite a while which meant it needed a major facelift in order to keep going.

The primary issue with the platform was its increasingly lacking manageability. The moderating tools were obsolete. This resulted in a slowly growing element of toxicity in the chatroom.

In order to make the chatrooms less toxic and more civil – the content moderation system needed a major overhaul. The main points were the detection of bullying and obscene content.  

The other major problem was scam/fraud and bot activity. As a niche public communication service, the platform needed to keep such things out of the system by all means necessary.

In addition to this, there were reasonable concerns about the privacy of conversations and user profiles due to the threat of cyberbullying and hacking by means of social engineering.

Finally, due to its age, the system needed an upgrade of its scaling capacity in order to deliver the best possible user experience without experiencing any fails and glitches in the process.

Project Details

Our task regarding the project can be described as:

  • to upgrade the existing online chat platform into a modern scalable system that can handle a large workload, and arm it with relevant content moderation and fraud detection tools. 
  • A highly scalable system with effective moderation, anti-fraud tools, and superior data protection compliance.

Our primary goal was to migrate the system to the Cloud Platform. These days, the cloud is the most fitting solution for any platform with a big workload – it is an easy solution for scaling and further development of the system.

The biggest task of the entire project was the development of the content moderation system. We needed a specific solution that would be effective, but not overly penetrative into the user’s conversations. 

The development of the fraud detection system was the other big challenge. Online chat platforms are fraud-prone – there is often something fishy going on. In addition to spam – there were more elaborate fishing scam schemes that needed to be taken care of. 

Last but not least was maintaining a high level of privacy and data safety. Due to the nature of the platform, there are always reasonable concerns about the privacy of conversations and user profiles. Because of this, we implemented a DLP protocol that makes personal data out of reach for malicious individuals. + erase personal data from analytics

Challenges 

Scalability – BigQuery

An online Chat platform is as good as its scalability capabilities. Given the fact that chat platforms harbor thousands of conversations at the same time – it needs to stay afloat while processing a large amount of data. In addition to this, the system needs to be safe from glitches and crashes that negatively affect the user experience. 

In order to provide the best possible environment for scalability – we decided to use the Google Cloud Platform. Its autoscaling features secure smooth and reliable data processing operations.  

Building a serverless data warehouse

Highly scalable, highly flexible. Simple to use.

In addition to that, we have BigQuery for nice and simple database management. 

Content Moderation Tools

The biggest task of the entire project was the development of the content moderation system. We needed a specific solution that would be effective, but not overly penetrative into the user’s conversations. 

We used Google DataFlow as a foundational element. This system is built around moderation guidelines that describe the do’s and don’ts of the platform and include specific patterns of hate speech and bullying that are unwelcomed on the platform.

Overall, the system:

  • Monitors the conversations
  • Performs topic modeling
  • Classifies its topic and context in case of alarm
  • Defines whether there is abusive or obscene behavior involved
  • Checks the image content
    • Blurs the image in case of obscenity
  • Bans the user in case of violence, spam, or other banned content

The important point was to avoid full-on user surveillance and turn the system on only in cases of the conversation content crossing the line and activating the algorithm.

Spam & Fraud Detection Tools

Fraud is one of the most biting issues of online chat platforms. Aside from toxic behavior and bullying – fraudulent activity is the third biggest issue plaguing anonymous online chats. While there is an uncontrollable element of social engineering at play – it is possible to cut the fraudsters off before the damage is done by implementing early warning systems. 

In order to do this, we implemented an automated system of fraud detection. It is built upon a database of examples of fraudulent behavior which is used as a reference point for subsequent operations. 

The solution includes:

  • Text classification – analyzing the content of messages for typical spam verbiage and potentially fraudulent messages.
  • Image Classification
  • Anomaly-based Bot detection – if the behavior of a particular user falls down into the bot-like spam pattern.

Integration of Image Recognition

Given the nature of this particular online chat platform – it was important to keep an eye on image content as it could be one of the major ways of enacting cyberbullying, scams, and obscene behavior. 

Because of this, we implemented the Google AutoML image recognition system CNN that classifies images and takes action if there is something violating the guidelines.

There are two services at play:

  • Google Vision API for general image recognition
  • Google AutoML Vision as a platform-specific solution.

Together, these services analyze the image content that is sent in conversations. 

  • In cases where there is any semblance of gore or otherwise obscene content – the image is blurred. 
  • In cases where images are accompanied by wholesale, toxic behavior, with distinct patterns of hate speech and bullying – the user is fully banned.

Superior Privacy – Data Loss Prevention

Maintaining privacy is one of the fundamental elements of an online chat platform. It is the foundation of trust and growth of the service. 

Because of this, the system needs to be secure from any sort of breaches and other compromises of personal data. 

In order to follow GDPR guidelines and maintain the appropriate level of privacy – we implemented a Data Loss Prevention tool.

This protocol monitors the content for sensitive information and deletes it – so that it is not identifiable in databases. 

Tech Stack

  • Google cloud platform
  • Big Query
  • Google DataFlow
  • Image Recognition
  • Google Machine vision API
  • Google Auto ML

Personnel

  • Project Manager
  • Business Analyst 
  • Data Engineer 
  • QA

Conclusion

This project can be considered a huge accomplishment for our team. What started out as a relatively simple overhaul of the system slowly evolved into a series of complex solutions that bring the platform to an entirely new level. 

We are especially proud of the flexible content moderation system that can keep things civil in the chatroom, while not being overbearing or overly noticeable.

  • An effective fraud detection system that can handle various types of chat-based fraud with ease

This project can be considered as a big milestone for our team. Over the years we have worked on different aspects of big data operation and developed many projects that involved data processing and analytics. However, this project gave us the chance to create an entire system from the ground up, integrate it with the existing infrastructure, and bring it all to a completely new level.

During the development of this project, we utilized more streamlined workflows that allowed us to make the whole turnaround much faster. Because of this, we managed to deploy an operating prototype of the system ahead of the planned date and dedicated more time to its testing and refinement. 

Skin Cancer Classification Neural Network Case Study

The treatment of diseases using cutting edge technologies is one of the prominent features of the healthcare industry. If there is a piece of tech that can make a difference – it will get its piece of the action and prove its worth.

In this regard, neural networks have achieved their well-deserved spotlight. The use of different types of neural networks has proven to be an effective tool in detecting and classifying cancer before it is too late.

In this article, we will talk about:

  • The state of skin cancer diagnosis technologies;
  • Describe our case study.

What is the state of cancer research?

Skin cancer is known for its deadliness. If not treated properly, this type of cancer can spread to other parts of the body, and in the worst-case scenario – become fatal.

At the time of writing this piece, skin cancer is amongst the most common types of cancer. According to the Center for Disease Control and Prevention study, The United States healthcare system deals with over 1,5 million new cases on a yearly basis.

However, its treatment workflow leaves a lot to be desired.

The most common problem with skin cancer treatment is a late diagnosis. This is a common occurrence due to a combination of technical and management issues. 

  • The current healthcare system is overloaded and riddled with bottlenecks in inpatient management and especially medical testing. In other words, things are going way too slow. 
  • This is bad news when it comes to cancer because timely diagnosis is one of the keys to effective treatment.
  • In addition to this, there is a lack of trained personnel to satisfy demand.

To make things worse, the technology behind diagnosis is not efficient enough to handle things. 

  • Detection and classification is the most critical and time-sensitive stage. 
  • Cancer diagnosis relies on a long series of clinical screenings, dermoscopic analysis, biopsies, and histopathological examinations. At best, this sequence takes months to complete. 
  • The whole process involves numerous professionals and continuous testing, yet,  it is only about 77% accurate.

Sounds grim, right? Well, there’s hope.

The rapid development of artificial intelligence and machine learning technologies, especially neural networks, can be a game-changer in cancer classification.

Our company was approached to develop a neural network solution for skin cancer diagnosis. Here’s how we achieved it.

How neural networks can handle a skin cancer diagnosis?

The central machine learning component in the process of a skin cancer diagnosis is a convolutional neural network (in case you want to know more about it – here’s an article). 

  • CNN can handle the classification of skin cancer with a higher level of accuracy and efficiency than current methods.

The gist of the system is in the way it applies the cancer research body of knowledge and public health databases to perform its operation. 

  • Human medical professionals mostly rely on their knowledge, experience, and manual handling of the results data. 
  • However, they are prone to human error. 
  • On the other hand, neural networks are capable of processing large quantities of data and taking more factors into consideration.

Here’s how it works: 

  • Classification stage – the detected anomalies are further assessed with different filters. 
  • The key requirement is to gather as much data as possible in order to make an accurate recognition;
  • After this, the resulting data is verified by medical professionals with the available databases and subsequently implemented into the patient’s health record.

The implementation of the machine learning neural network into the process of skin cancer classification can significantly help with the following issues:

  • Streamline cancer diagnosis workflow – make it faster, more efficient and cost-effective.
  • Lessen the dependence on various medical professionals in the diagnosis process.
  • Reduce the delivery time of clinical testing results.
  • Increase the accuracy of clinical testing results.

The project description

The key requirements for the development of a skin cancer diagnosis neural network were the following:

  • System scalability; 
  • Accuracy of the results; 
  • Accessible interface with effective visualizations;
  • Cost-effectiveness of the infrastructure.

The main challenges of the implementation of neural networks to the cancer diagnosis workflow were the following:

  • Time-sensitivity. 
    • Data processing takes time to complete, while, at the same time, there is a need to get results as soon as possible;
    • The algorithms require time for refinement and optimization;
  • Classification requires significant computational resources for input data;

  • The maintenance of such infrastructure is quite expensive. 

Considering Developing a Healthcare Mobile App?

Download Free Ebook

In order to deal with these challenges, we decided to build an entire system on a cloud platform. 

  • This approach handles the scalability and time-sensitivity issues. 
  • At the same time, the use of the cloud platform allows limiting spending to only the resources actually used.

The system itself consists of the following elements:

  1. Image input
  2. Convolutional neural network for classification 
  3. Cloud Datastore 
  4. Integration with relevant databases
  5. Browser-based dashboard with results and visualizations

The system was developed with the following tools: 

  • HAM10000 dataset; 
  • ImageNet pre-trained models;
  • TensorFlow for VGG16 CNN
  • Apache Beam for data processing pipeline; 
  • D3 visualization package; 

The information transformation is performed in the following sequence:

  1. Input images are uploaded to the Cloud Storage and sent to CNN;
  2. Convolutional Neural Network processes input images:
    1. Anomaly detection algorithm rounds up the suspicious elements;
    2. The classification algorithm determines the type of anomaly.
  3. The results of the processing are then saved to the database; 
  4. After that, the results are summarized and visualized. 

Our Solutions

Training Convolutional Neural Networks

CNN was trained on a publicly available skin lesion dataset HAM10000.

The classifiers include the following criteria:

  • anatomic location of the lesion; 
  • patient profile characteristics (age, gender, etc);
  • lesion size, scale, and scale-crust;
  • telangiectasia and other vascular features;
  • semi translucency;
  • pink blush;
  • ulceration;
  • blue-grey ovoids;
  • dirt trails;
  • purple blotches;
  • pale areas;

The image classification algorithm included 

  • the decision forest classifier
  • random forest classifier

The results were visualized as a confusion matrix area under the receiver operating characteristic (AUROC) curve as the key metric. 

Scalability & Cost-effectiveness

The scalability challenges are handled by the distributed services of the cloud infrastructure. 

  • Google Cloud Platform’s autoscaling features provide high scalability of the system. The system can handle as much workload as it needs to do the job. 
  • The data processing workflow consistency is provided by the Apache Beam framework. 

The use of cloud infrastructure cut the maintenance costs by half and made the system cost-effective. With the cloud solution, the maintenance costs are limited to the used resources. 

One of the key requirements of the project was to refine image recognition models of the convolutional neural network in order to provide more accurate results. 

  • CNN was trained on the HAM10000 dataset. This dataset includes samples of different types of skin cancer and its identifying elements. 

Integration of the system components

The biggest challenge of the project was to integrate different input, analysis and visualization tools into a coherent whole. 

  • The thing is – skin cancer diagnosis workflow is not designed with distributed services in mind. At the same time, its elements can be presented as such. 
  • Cloud computing power gives enough scalability to process input data in a shorter span of time. 

User Interface

The other big challenge of the project was interface accessibility. 

  • In order to maintain its usefulness, the system needed an accessible user interface. 
  • The key requirement was to present the data required in a clearly structured and digestible form.
  • In order to figure out the most appropriate design scheme – we applied extensive user testing. 
  • The resulting interface is an interactive dashboard with data visualization and reporting features.

Conclusion

The implementation of the convolutional neural network to the process of skin cancer diagnosis was a worthwhile test of the technology’s capabilities.

Overall, the results are as follows:

  • The average time of delivering test results is 24 hours. 
  • With the CNN classification, the results can be delivered in a matter of hours (one hour average, depending on the amount of input data).
  • The accuracy of results averages 90% (on testing data). 
  • In addition to that, the more system is in action, the more efficient it gets at clinical data processing.
  • The operational costs are reduced by half. 

This project was a real test of strength for our team. We applied our cloud infrastructure expertise and healthcare data science

As a result, we’ve built a system that is capable of processing large quantities of data in a relatively short time. 

Case Study: Real-time Diagnostics from Nanopore DNA Sequencers

Data Analysis in Healthcare is a matter of life and death, and it’s also a very time-consuming task when you do not have the proper tools. When we are talking about sepsis – the dangerous condition when the body starts to attack its organs and tissues in attempts to fight off the bacteria or other causes – the risk of losing the patient due to sepsis increases by 4% with each hour.

The researchers from the University of Queensland and the Google Cloud Platform developers have teamed up with the APP Solutions developers to provide medical doctors with a tool to help patients before they suffer from septic shock.

With the emergence of nanopore DNA sequencers, this task becomes manageable and much more efficient. These sequencers stream raw data and generate results within 24 hours, which is a significant advantage, especially when doctors need to identify pathogenic species and antibiotic resistance profile.

The primary challenge, from the technical point of view, lies with data processing, which requires significant resources for processing and subsequent storage of incoming data. The APP Solutions team tackled the development of a cloud-based solution to solve this challenge.

About the Project: Nanopore DNA Sequencers

Our team worked on the cloud-based solution for the Nanopore DNA Sequencing, and we have developed a Cloud Dataflow integrated with the following technologies:

  • FastQ Record Aligner
  • JAPSA Summarizer
  • Cloud Datastore and App Engine
  • App Dashboard

The pipeline itself consists of the following elements:

  • Chiron base caller implemented as a deep neural-network
  • Detectors for species and antibiotic resistance genes
  • Databases for long-term experimental data storage and post-hoc analysis
  • A browser-based dynamic dashboard to visualize analysis results as they are generated

Overall, the system is designed to perform the following actions:

  • Resistance Gene Detection: this pipeline identifies antibiotic resistance genes present in a sample and points out actionable insights, e.g., what treatment regimen to apply to a particular patient.
  • Species Proportion Estimation: this pipeline estimates the proportion of pathogenic species present in a sample. Proportion estimation can be useful in a variety of applications including clinical diagnostics, biosecurity, and logistics/supply-chain auditing.

The software is open-source, built on the open-source packages:

  • JAPSA
  • TensorFlow
  • Apache Beam
  • D3

We have used Google Cloud to implement the data analysis application due to its scaling capacity, reliability, and cost-effectiveness. It includes a wide array of scalable features for Tensor Processing Units and AI accelerator microchips.

The transformation of information follows this sequence:

  1. Integration – files are uploaded to the Google Cloud Platform and streamed into the processing pipeline;
  2. Base-calling stage – machine learning model infers DNA sequences from electrical signals;
  3. Alignment stage – via a DNA database, the samples are analyzed to find pathogen sequences and other anomalies;
  4. Summarization stage – calculation of each pathogen’s percentage in the particular sample;
  5. Storage and visualization – the results are saved to Google Firestore DB and subsequently visualized in real-time with D3.js.

Watch the video about the project: 

Nanostream Project Tasks & Challenges

Ensuring Data Scalability

Nanopore Sequencer DNA Analysis is a resource-demanding procedure that requires speed and efficiency to be genuinely useful in serving its cause.

Due to the high volume of data and tight time constraints, the system needs to scale accordingly, which was achieved via the Google Cloud Platform and its autoscaling features. GCP secures smooth and reliable scalability for data processing operations.

To keep the data processing workflow uninterrupted no matter the workload, we used Apache Beam.

Refining Data Processing and Analysis Algorithms

Accuracy is the central requirement for the data processing operation in genomics, especially in the context of DNA Analysis and pathogen detection.

The project required a fine-tuned, tight-knit data processing operation with an emphasis on providing a broad scope of results in minimal time.

Our task was to connect the analytics application to the cloud platform and guarantee an effective information turnaround. The system was thoroughly tested to ensure the accuracy of results and efficiency of the processing.

Integrating with DNA Analysis Tools

DNA Analysis tools for Nanopore sequencers were not initially developed for cloud platforms and distributed services. The majority of the analysis tools were just desktop utilities, but this significantly limited capability. We needed to integrate the desktop-based DNA analysis tools into a unified, scalable system.

We have reinterpreted desktop-based DNA analysis tools for HTTP format and distributed them as web services, which made them capable of processing large quantities of data in a shorter timespan.

Securing Cost-Effectiveness & Reducing Overhead

Nanopore DNA Sequencers are a viable solution for swift pathogen analysis and more competent medical treatment. However, the maintenance of such devices can be a challenging task for medical facilities due to resource and personnel requirements. Also, the scope of its use is relatively limited in comparison with the required expenditures.

We moved the entire system to Google Cloud Platform to solve this issue, allowing the service to be accessed and scaled without unnecessary overhead expenses.

Developing Accessible User Interface

Machine learning and big data analysis systems can process much data, but it’s useless until the insights are presented in such a way that is understandable. In the case of the Nanopore DNA Sequencing solution, the idea was to give a tool to the medical staff that would help them make decisions in critical situations and save lives. Therefore, an accessible presentation was one of the essential elements of this research project.

The system needed an easy-to-follow and straightforward interface that provided all the required data in a digestible form, avoiding confusion.

To create the most convenient user interface design scheme, we have applied extensive user testing. The resulting user interface is an interactive dashboard with multiple types of visualization and reporting at hand that requires minimal effort to get accustomed to and start using it.

When it came to visualization, the initial format of choice was a pie chart. However, it was proven insufficient in more complex scenarios.

Because of that, we have concluded that there was a need to expand the visualization library and add a couple of new options, which was where the D3 data visualization library helped us out.

Throughout extensive testing, we have figured out that Sunburst diagrams are doing an excellent job of showing the elements of the sample in an accessible form.

Project’s Tech Stack & Team

There were many technologies involved, the majority of which had to do with big data analysis and cloud: 

  • JAPSA
  • TensorFlow
  • Chiron Base Caller
  • Google Cloud
  • Google Cloud Storage
  • Google Cloud PubSub
  • Google FireStore
  • Google Cloud Dataflow
  • Apache Beam
  • D3 Data Visualization Library
  • JavaScript

Related articles:

How to Pick Best Tech Stack for Your Project

Calmerry Telemedicine Platform Case Study 

From the APP Solutions’ side, we had four people working on this Nanopore DNA Sequencers project: 

  • 2 Data Engineers
  • 1 DevOps Engineer
  • 1 Web Developer

Creating Nanopore DNA Sequencing Cloud-Based Solutions

This project was an incredible experience for our team. We had a chance to dive deep into the healthcare industry as well as machine learning, data analysis, and Google Cloud platform capabilities.

While we were exploring the possibilities of data analysis in healthcare applications – we found out many parallels between data analysis in other fields.

We have managed to apply our knowledge of cloud infrastructure and build a system that is capable of processing large quantities of data in a relatively short time – and help doctors save patients’ lives!

Learn more about the project and check out our contributions to GitHub:

What our clients say 

Looking for a big data analytics partner?

Contact us

Case Study: Cross-Platform Data Analytics with Data Warehouse

The nature of enterprise companies is that they are reliant on the “big picture” – an overarching understanding of the things happening on the market in general as well as in the context of a particular product.

Data Analytics is the way of visualizing the information with a handy set of tools, which show how things are moving along. As such, it is an indispensable element of the decision-making process.

Understanding the big picture binds every source of information together in one beautiful knot and presents a distinct vision of past, present, and possible future.

In one way or another the big picture affects everything:

  • Day-to-day operations
  • Long-term planning
  • Strategic decisions

Big picture view is especially important when your company’s got more than one product, and the overall analytics toolbox is scattered.

One of our clients needed a custom big data analytics system, and that was the task set before the APP Solutions’ team of developers and PMs.

ECO: Project Setup

The client had several websites and applications with a similar business purpose. The analytics for each product were separate, so it took considerable time and effort to combine and assess into the plain overarching view.

The dispersion of the analytics caused several issues:

  • The information about the users was inconsistent throughout the product line;
  • There was no real understanding of how the target audiences of each product overlap.  

There was a need for a solution that will gather information from different sources and unify them in one system.

Our Solution – Cross-Platform Data Analytics System

Since there were several distinct sources of information at play, which were all part of one company, it made sense to construct a nexus point where all the information would come together. This kind of system is called cross-platform analytics or embedded analytics.

Overall system requirements were: 

  • It has to be an easily-scalable system
  • It can handle big data streams
  • It can produce high-quality data analytics coming from multiple sources. 

In this configuration, the proposed system consists of two parts:

  • Individual product infrastructure – where data is accumulated;
  • Data Warehouse infrastructure – where information is processed, stored, and visualized.

Combined information streams would present the big picture of product performance and the audience overlap.

The Development Process Step by Step

Step 1: Designing the Data Warehouse

Data Warehouse is the centerpiece of the data analytics operation. It is the place where everything comes together and gets presented in an understandable form.

The mains requirements for the warehouse were:

  • Ability to process a large amount of data in a real-time mode
  • Ability to present data analytics results in a comprehensive form.

Because of that, we needed to figure out a streamlined dataflow that will operate without much of a fuss.

There are lots of data coming in different types of user-related events:

  • clicks,
  • conversions,
  • refunds
  • other input information.

In addition to storing information, we needed to tie it with the analytics system, which required synchronization of the system elements (individual products) for ever-relevant analytics.

We decided to go with the Cloud Infrastructure for its resource management tools and autoscaling features. It made the system capable of sustaining a massive workload without skipping a beat.

Step 2: Refining Data Processing Workflow

The accuracy of data and its relevance are critical indicators of the system working correctly. The project needed a fine-tuned system of data processing with an emphasis on providing a broad scope of results in minimal time.

The key criteria were:

  • User profile with relevant info and updates
  • Event history with a layout on different products and platforms

The system was thoroughly tested to ensure the accuracy of results and efficiency of the processing.

  • We used BigQuery’s SQL to give data a proper interface.
  • Google Data Studio and Tableau are used to visualize data in a convenient form due to its flexibility and accessibility.

Step 3: Fine-Tuning Data Gathering Sequence

Before any analytics could happen – there is data gathering to be done, and it should be handled with care. The thing is – there should be a fine-tuned sequence in the data gathering operation so that everything else could work properly.

To collect data from various products, we have developed a piece of javascript code that gathers data from different sources. It sends data over for processing and subsequent visualization in Google Data Studio and Tableau.

This approach is not resource-demanding and highly efficient for the cause, which makes the solution cost-effective.

The whole operation looks like this:

  1. Client-side Data is gathered by JavaScript tag
  2. Another part of the data is submitted by individual products server-to-server
  3. The information is sent to the custom analytics server API which publishes it to the events stream
  4. Data processing application pulls events from the events stream and performs logical operations on data
  5. Data processing app stores resulting data into BigQuery

Step 4: Cross-Platform Customer/User Synchronization

The central purpose of the system was to show an audience overlap between various products.

Our solution was to apply a cross-platform user profiling based on the digital footprint. That gives the system a unified view of the customer – synchronized across the entire product line.

The solution includes the following operations:

  • Identification of the user credentials
  • Credential matching over profiles on different platforms.
  • After that – the profiles were then merged into a unified profile that was gathered data across the board
  • Retrospective analysis – to analyze the user activity on different products, compare profiles, and merge the data if there are any significant commonalities.

Step 5: Maintaining Scalability

The number one priority of any big data-related operation can scale according to the required workload.

Data processing is a kind of operation that requires significant resources to be appropriately performed. It needs speed (approx 25GB/h) and efficiency to be genuinely useful in serving its cause.

The system requirements included:

  • Being capable of processing large quantities of data at the required timeframe
  • Being capable of easily integrating new elements
  • Being open to a continuous evolution

To provide the best possible environment for scalability – we have used the Google Cloud Platform. Its autoscaling features secure smooth and reliable data processing operations.

To keep the data processing workflow uninterrupted no matter the workload, we used Apache Beam.

Tech Stack

  • Google cloud platform
  • Cloud Pub/Sub
  • Cloud Dataflow
  • Apache Beam
  • Java
  • BigQuery
  • Cloud Storage
  • Google Data Studio
  • Tableau

Project Team

Any project would not be complete without the team 

  • Project Manager
  • Developer
  • System Architect
  • DevOps + CloudOps

Conclusion

This project can be considered as a big milestone for our team. Over the years we have worked on different aspects of a big data operation and developed many projects that involved data processing and analytics. However, this project gave a chance to create an entire system from the ground up, integrate it with the existing infrastructure, and bring it all to a completely new level.

During the development of this project, we have utilized more streamlined workflows that allowed us to make the complete turnaround much faster. Because of that, we have to manage to deploy an operating prototype of the system ahead of the planned date and dedicated more time to its testing and refinement.

Consider developing a custom data analytics system?

Write to us

AI Versus – TV RAIN Conversational User Interface

The way we think creates a picture of the world we are living in, and television has a significant impact on our personas. Imagine twins who were taken to different families and raised by different parents. Even though they have the same DNA, they will have different opinions on politics, culture, and economics. 

ay versus chatbot development case study

We teamed up with developers from ISD GmbH and advertisement agency Voskhot to recreate a similar experiment and show the difference between propaganda and independent news leveraging Artificial Intelligence capabilities.  

au versus chatbot development team

For this project, we built two neural networks and trained them with different datasets. The first network “watched” the governmental channel Russia-1 during a six month period, while the second network was trained with programs from the independent television channel Dozhd (TV Rain), broadcasted during the same period. To find out how those networks answer the same questions, we integrated both neural networks into a Telegram-powered chatbot. We made it available to the public via API on the AI VERSUS website. 

chatbot development team the app solutions
ai chanbot development project goal

After six months of training, chatbots were able to answer users’ questions based on the vocabulary from programs watched. Website visitors can vote for answers received and share answers on social networks. As a result, Rain TV’s AI bot got 93% of users’ votes. 

Project goals and requirements   

We were hired by ISD GmbH to complete the Machine Learning part of the project. We needed to:

  • Select the best possible Machine Learning model for neural networks
  • Choose and integrate the speech to text tool 
  • Train neural networks with Russia1 channel and Rain channel programs 
  • Create the user interface to enable chatbots to answer user questions 

Challenges we overcame 

At the beginning of the project, we faced two main challenges. 

Script-based chatbots do not meet our goals  

The majority of existing conversational agents (chatbots) operate on a pre-made script to perform particular tasks, such as booking tickets or ordering delivery. The purpose of our project was more complex – to create a neural network that will maintain a dialogue with the user like a real person and answer meaningful phrases without a predefined script. 

Our Solution

In the beginning, we tried to find a ready-trained neural network model but failed to find one. Thus, we focused on developing Artificial Intelligence from scratch by testing and combining different AI models. The core of the project is the Trained Neural Network, the main component of the system, accomplished with an algorithm that can identify and build relationships between words and topics from the database. 

The video format is unsuitable for neural network training

TV programs are broadcasted in video format. However, we could train neural networks using only text documents. Also, TV programs include different participants who support different ideas, so the neural network should understand the programming context.  

Our Solutions 

We decided to concentrate on such topics as politics, social issues, and culture. Then, we selected video programs on relevant topics and applied Google’s Speech-to-Text tool integrated into our system. The tool helped the network to identify the program’s language and a number of speakers, break down videos into abstracts, and record text to the project’s database. We knew that such an approach may result in some mistakes during text recognition, but it was the best possible option for us. 

How we did it 

Step 1. Choosing the training model 

At the beginning of the project, we used a Question Answering Model (Q&A), which can answer questions. We tried this model on a small number of test datasets and received quite satisfying results for questions on specific topics. This model gathered answers from a limited amount of information from the database. However, to provide users with a real experience of communication with Artificial Intelligence, we needed to increase the amount of information in the database considerably. Unfortunately, the system was not capable of handling vast amounts of data due to the system’s limited memory. During the second development stage, we tried to overcome the system’s memory limitations, which ended without results. 

Step 2. Leveraging ODQA model 

During the second phase of our experiment, we leveraged the open-domain question answering (ODQA) model powered by Deep Pavlov. DeepPavlov is an open-source conversational AI library built on TensorFlow and Keras. The ODQA model was trained to find answers in a vast database, like Wikipedia. A model with basic settings performed poorly. However, we liked the Word Embeddings technology behind the ODQA model. World Embedding technology allows “embed” words in the text to receive their numerical representation, and conducts mathematical operations. Then, by applying the reverse embedding, we accept a meaningful phrase. 

Thus, we decided to modify this model slightly and use it in our project.  

Step 3. Integrating a context identification technology 

World Embedding technology allows our system to not only find similar words, sentences, abstracts, and contexts. The context is a small part of a TV program on a particular topic with knowledge AI uses during a dialogue with a real user. 

At this stage, we faced the challenge of how to decide which context suits the best. To solve this, we decided to use rankers. After testing several options, we decided to leverage TFIDF, short for term frequency-inverse document frequency. This numerical statistic reflects how important a word is to a document in a collection or corpus.

The higher the rate, the higher probability our system will answer with a meaningful phrase. 

To make AI answer meaningful phrases, we needed to select the context with higher frequency and apply it to our first Q&A. 

Step 4. Filtering out irrelevant questions 

Unfortunately, our system wasn’t perfect on the first try and failed due to irrelevant questions,a lack of knowledge on some topics in the database, and system errors. At this stage, we needed to improve the system’s ability to define the content and make answers even more meaningful. Thus, we decided to teach the network to identify user questions better and correlate them with topics, which is quite a simple task for a Machine Learning algorithm. Its success depended on the quality of the education. Thus, we decided to release the existing system for free alfa testing on the Telegram platform to find out what questions are more critical for Russian citizens. During such an open trial, users asked questions, and the system tried to define the topic. The user could agree on the selected text or note that items belong to other issues. In this way, we successfully managed to filter out irrelevant questions that did not belong to our main topics (politics, economics, and life in the country).

Step 5. Getting the chatbot up and running 

At this stage, we excluded irrelevant topics from both neural networks, developed an API to integrate the project into the website, added social sharing buttons so users could share answers received in Twitter, Facebook, and Vkontakte social networks, and officially released the project. 

Since the project required a lot of space for operations, we used quite a powerful server with a large amount of RAM. However, we needed to save money on process resources, which impacted the chatbot answering time. On the other hand, such a delay in providing answers makes users feel that the chatbot thinks before answering something meaningful, like a real person. 

Our tech stack 

  • Google Cloud Platform as a hosting environment 
  • Deep Pavlov, Doc2Vec/Word2Vec for neural network development and training
  • Google Speech-to-Text API for converting TV programs into text documents
  • Custom API for database management
ai verses chatbot architecture

[AI Verses architecture view]

The APP Solutions Team Composition

  • 1 Project Manager
  • 2 Software Developers
  • 1 QA Engineer

Results

We developed a comprehensive neural network that could be applied to almost any dataset. Once the project went life, it became viral, and around 193 428 visitors from 16 countries shared over 2 million of AI VERSUS’s answers on their social network profiles. 

ai versus chatbot answers 1
ai chatbot answers 2
ai verses chatbot answers 3
ai versus chatbot answers 4

In this way, we, ISD GmbH, and Voskhod advertising agency, showed the real difference between propaganda and independent news. We hope that users, who communicate with AI VERSUS, have become more selective in the context they consume after seeing the difference in the answers of our neural networks. 

We are also proud that the AI VERSUS chatbot project got third place at the Cannes Lions International Festival of Creativity on nomination Creative Data: Data Storytelling.

Get articles to your email once a month

Subscribe for updates

Case Study: Semantic Search For Improving Customer Support

Positive Customer Experience (CE) is critical for any product or service. CE is the mirror of the user’s perception of the product. Engaging and robust CE is a sign that everything is going according to the plan. It is a direct result of every element of the product working together for the bigger whole.

Customer experience is what keeps the business going. To achieve quality results (besides caring for the users), marketers and business people use numerous modern technologies, from service personalization, conversational UI up to natural language processing and semantic search.

One of the most prominent elements in the grand scheme of customer experience is customer support. Let’s explain why.

Why Customer Service Matters

Customer service is the sign of a responsible company that respects its users. It is one thing to develop a product and leave users on their own, and it is an entirely different thing to step in and help users to understand how to use the product and solve their emerging issues.

The direct purpose of customer support is to solve emerging user issues appealingly and constructively. The nature of the problems goes far and wide. It might be:  

  • Minor misunderstanding of the product’s design scheme.
  • Incorrect use of the product.
  • Actual technical problems of the product.  

Сustomer service is a user feedback forum designed to research the general perception of the product and also find out what features and elements require further improvement and polishing.  

However, the workflow of the customer support department is not without its issues.

  • When the product is widely used – the amount of various user feedback is inevitably overwhelming. It takes time to sort it out and process.
  • It takes time to train new operators and get them into the ins and outs of the product.
  • It also often takes a lot of time and effort to interact with the customer to figure out the nature of the issue.
  • Then, it takes some time to find appropriate responses from the knowledge base.

Because of that, customer support often gets a lot of criticism for trying to be what it is. But there is a solution.

How did we manage to improve customer support service by implementing semantic search features?

How to Improve Customer Support with Semantic Search? MSP Case Study

The Project’s Setup

MSP was a customer support system that handled user feedback and solved emerging issues across the product line-up of the client. The system was barely capable of doing its job, and the overall workflow was too clumsy.

There was a notion that the current MSP workflow was holding the system back and that directly affected the quality of the customer service. Due to the scope of the system – it was apparent that the system just wasn’t designed for the scope it was handling.

The issues included:

  • The response time was too slow during high workload periods – which resulted in the appearance of the bottlenecks.
  • The system slowed down considerably during high-load periods.

Because of that, it was decided to make an overhaul of the system:

  • Make it more scalable to process large quantities of data
  • Streamline and optimize the workflow, so that the operators would have easy access to the knowledge base and deliver responses to the customers in less time.
  • Simplify the training process of the new employees by letting them study the existing knowledge base with ease.

The APP Solutions was approached to develop a suitable solution that would handle these issues.

Our global task on the project was to develop a flexible solution that would streamline the workflow of the customer support operation and at the same time, make it more valuable in terms of insights.

On the ground level, the tool would simplify the workflow of the customer support operator and make it easier for him to find the right answers from the database in a short time.

In addition to that, we wanted to provide an analytical component that would show the trends and uncover insights into the customer issues in an understandable form.

Due to the implementation of fundamental natural language processing features – this also allowed us to analyze the database and extract valuable insights regarding product use and related issues.

Step 1: Figuring out the right workflow optimization tool

The main challenge of the project was to figure out how to make a tool that would streamline the workflow of the customer support operator and make it as efficient as possible. The critical requirement was accessibility.

It was apparent that the solution required an application of natural language processing. The semantic search feature seemed to be perfect for the cause. This approach enables finding relevant information in the database from the basic input. The operator needs to give the system a specified request, and the system will find the stuff he’s looking for.

The question was which approach was the most fitting for the task.

To determine the best approach – we have tried and tested several natural language processing tools:

  • At first, we have tried Elmo Embeddings. The results were okay, but the performance left a lot to be desired.
  • Then we tried GloVe. It was good but too complicated for its purpose. We needed a leaner solution.
  • Finally, we have tried Doc2Vec, which proved to be the most optimal solution both in terms of performance and overall flexibility.

Step 2: Developing the semantic search model

After figuring out the optimal solution, we needed to make it work within the customer support database.

The primary function of the application was to:

  • Find dialogues and responses in the database that are the most relevant to the input query.
  • This information would serve as a basis for further customer-operator interaction and save a significant amount of time in the process.

The model was trained on the body of texts made out of customer support conversations. The backbone of the model was developed in Python NLTK.

To configure the system, we have applied TF-IDF scoring to determine relevance to the user’s queries. Besides, we have performed bigram and trigram corpus analysis and basic polarity check, which provided a foundation for further database processing with Doc2Vec. After that, the application was deployed with Flask API.

Step 3: Optimizing the semantic search model

The secret of the effective semantic search is thorough testing and continuous optimization of the model. The thing is – NLP applications are like diamonds in the rough. Even the simplest models require polish to do its job as it was supposed to do.

The critical factor is the flexibility of the model in recognizing the specific aspects of the text and finding matches for the user’s queries.

This process required the use of several training scripts and a comparative analysis of the results. Each scenario was designed for specific detail.

  • One script to analyze the general context of the conversation
  • The other to extract keywords relevant to the service
  • Third to determine the tone of discussion at the start, throughout the interaction, and after solving the issue.

After a thorough polish, the system became flexible enough to find a selection of relevant replies even from the broadest queries.

Step 4: Ensuring scalability

Slow responses are one of the chief complaints about customer support services. Delivering results on both ends on time is one of the critical factors of efficient operation. This required the ability of the system to scale according to the workload.

From a business standpoint, – scalability is one of the columns of well-oiled and smooth customer service. In the context of r search, this being able to process a large amount of data and find the relevant selection in a short time.

To make the system capable of processing multiple database requests and deliver fast results – we have used Google Cloud Platform’s autoscaling features.

With this intact – scalability became a non-issue so that the users could focus on issues that are more integral to their working process.

Read also: Classical Artificial Neural Networks

Tech Stack

  • Google Cloud Platform
  • NLTK for Python
  • Flask API
  • Doc2Vec

Personnel

  • Project manager
  • Web developer

Conclusion

MSP project was an excellent showcase of how to find a reliable and straightforward solution to the complex problem of customer support workflow.

It was an opportunity for our team to implement new skills in practice and develop a solution for this type of project.

Another important thing is that we have managed to realize the project in a relatively short time frame. In over a month, we already had a working prototype, and by the end of the second month, the system was fully operable.

Calculate the development cost of your app

Receive a fee estimate