Healthcare and data science are something of a perfect pair. Healthcare operations require insights into patient data to function at a practical level. At the same time, data science is all about getting deep into data and finding all sorts of interesting things.
The combination of these two resulted in the adoption of Electronic Health Records (EHR) that use a data science toolkit for the benefit of medical procedures.
In addition to this, healthcare is the perfect material for various machine learning algorithms to streamline workflows, modernize database maintenance, and increase the accuracy of results.
In this article, we will explain what EHR is and how machine learning makes it more effective..
Electronic Health Record (aka EHR) is a digital compendium of all available patient data gathered into one database.
The information in EHR includes medical history, treatment record data such as diagnoses, medications, treatment plans, immunization dates, allergies, radiology images, laboratory and test results.
- The adoption of EHR in the industry kickstarted in the late 90s after the enacting and signing of HIPAA (Health Insurance Portability and Accountability Act) in 1996.
- However, due to technological limitations, things proceeded slowly.
- The technology received a significant boost after the passing of the HITECH (Health Information Technology for Economic and Clinical Health) Act in 2014 which specified the whats, whys, and hows of EHR implementation.
The main goal of implementing EHR is to expand the view of patient care and increase efficiency of treatment.
In essence, EHR is like a good old patient’s paper chart which expands into a full-blown, interactive, data science dashboard, with real-time updates where you can examine the information and also perform various analytical operations.
- Think about it as a sort of Google Account type of thing, where your data is gathered into one place and you can use it for multiple purposes with tools like Office 365 or the likes.
The critical characteristics of Electronic Health Records are:
- Availability - EHR data is organized and updated in real-time for further data science operations, such as diagnostics, descriptive analytics, predictive analytics, and, in some cases, even prescriptive analytics. It is available at all times and shared with all required parties involved in a patient’s care - such as laboratories, specialists, medical imaging labs, pharmacies, emergency facilities, etc.
- Security - the information is accessed and transformed by authorized users. All patient data is stored securely by extensive access management protocols, encryption, anonymization, and data loss protection routines.
- Workflow optimization - EHR features can automate such routine procedures as recurrent Automate and streamline provider workflow. In addition to this, EHR automation can handle healthcare data processing regulations such as HITECH, HIPAA (USA) and PIPEDA (Canada) by implementing required protocols during data processing.
There is also another type of electronic record system used in healthcare operations - Electronic Medical Record AKA EMR.
The main difference between EHR and EMR is the focus on different persons involved in medical proceedings.
- EMR is a digital version of the dataflow in the clinician’s office. It revolves around a specific medical professional and contains treatment data of numerous patients within the specialist’s practice.
- In contrast, EHR data revolves around the specific patient and his medical history.
In one way or other, EHR intertwines with numerous EMRs within its workflow. There is a turnaround of data going back and forth - medical histories, examination data, test results, time-based data comparison, and so on.
Read a more detailed overview of EHR/EMR differences in the article EHR, EMR and PHR Differences
As was previously mentioned, the availability of data is one of the primary benefits of implementing Electronic Health Records into medical proceedings.
Aside from data being available for medical professionals at all times, the way medical data features in EHR makes it perfectly fitting for various machine learning-fueled data science operations.
Overall, machine learning is a viable option in the following aspects of EHR:
- Data Mining
- Natural Language Processing
- Medical Transcription
- Document Search
- Data Analytics
- Data Visualization
- Predictive Analytics
- Privacy and regulatory compliance
Let’s look at them one by one.
Gathering valuable insights is one of the essential requirements for providing efficient medical treatment. One of the challenges that come with gaining insights is that, in order to do that, you need to go through a lot of data. This process takes a lot of time.
With the increasing scope of data generated by medical facilities and its growing complexity - the use of machine learning algorithms to process and analyze information during data mining becomes a necessity.
Overall, the use cases for Data mining in EHR revolve around two approaches with different scopes:
- Finding data about the patient and his treatment. In this case, ML is used to round up relevant information in the medical history and treatment record to assist further in the decision-making process.
- On the other hand, patient-centered data mining is used to assess different types of treatment and outcomes by studying similar cases from the broader EHR database.
- Data extraction for medical research across multiple EHR/EMR, and also public health datasets. In this case, a machine learning application is used to gather relevant data based on specific terms and outcomes across the EHR database. For example, to determine which types of medication for particular ailments were proven to be active and under what circumstances.
- On the other hand, the same tools apply for exploratory research that reshapes available data according to specific requirements — for example, examining test result patterns of annual lipid profiles.
EHR is all about data analytics and making it more efficient. One of the most important innovations brought by EHR is streamlining the data pipeline for further transformations.
The thing is - EHR machine learning-fueled data processing provides a foundation to identify patterns and detect certain tendencies occurring throughout numerous tests and examinations of a specific patient across multiple health records.
- With all patient data and respective reference databases intertwined into a single sprawling system - one can leverage the available data to predict possible outcomes based on existing data.
- Predictive analytics assist the doctor’s decision-making process by providing more options while considering possible courses of action.
- On the other hand, machine learning predictive analytics reduces the time required to pro.
Predictive analytics models are trained case-by-case on the EHR databases. The accumulation of diverse data allows them to identify common patterns and outliers regarding certain aspects of disease development or a patient’s reaction to different treatment methods.
Let’s take DNA Nanopore Sequencing as an example.
- The system combines input data (coming from the patient) with data about the illness and ways of treating it.
- The predictive algorithm determines whether a particular match of treatment will result in a positive outcome and to which extent. (you can read more about Nanostream in our case study).
In one way or another, natural language processing is involved in the majority of EHR-related operations. The reason for that is simple: most medical record documentation is in a textual form combined with different graphs and charts to illustrate points.
- Why not use simple text search instead? Well, while the structure of the document is more or less uniform across the field, the manner of presentation may vary from specialist to specialist. NLP solution provides more flexibility in that regard.
The main NLP use cases for EHR are the following:
- Document Search - both as part of the broader data mining operation and simply as an internal navigation tool. In this case, the system uses a named-entity recognition model trained on a set of specific terms and designations related to different types of tests and examinations. As a result, doctors can save time on finding relevant information in the vast scopes of data. Depending on the purpose, the search results form via the following methods:
- By context - locating information within the document - vanilla document search. For example, you can perform a comparison of physical examination reports criteria by criteria.
- Terms / Topics / Phrases - extracting instances of specific terms used or topics mentioned. For example, a doctor can obtain all blood test results and put them into perspective.
- Search across multiple documents;
- One of the most prominent current applications is the Linguamatics I2E platform which also provides data visualization features.
- Medical transcription - in this case, NLP is used to recognize speech, and subsequently, format it in an appropriate way (for example, break down into segments by context).
- The speech-to-text component operates with a set of commands like “new line” or “new paragraph.”
- Nuance Communications make one of the most prominent products of this category. Their tools, Nuance Dragon, augments EHR with a conversational interface that assists with filling data into the record.
- Report generation - in this case, NLP functions as a form of data visualization in a textual form. These models are trained on existing reports and operate on specific templates (for example, for blood test results). Due to the highly formalized language of the reports, it is relatively easy to train a generative model based on term and phrase collocation and correlation.
- In this case, the correct verbiage is analyzed out of the habitual juxtaposition of a particular word with another word or words with a frequency higher than chance (collocation) and the extent to which two or more variables fluctuate together (correlation).
Data visualization is another important aspect of data analytics brought to its full extent with the implementation of Electronic Health Records.
Visualization is one of the critical components that make EHR more effective in terms of accessibility and availability of data for various data science operations.
- The thing is - as an electronic health record is basically a giant graph with lots of raw data regarding different aspects of the patient’s state,as such, it is not practical to use it in this state. The role of visualization, in this case, is to make data more accessible and understandable for everyday purposes. That has to be obvious, right?
However, you can’t use the same data visualization template for every EHR. While the framework remains the same, it requires room for customization to visualize patient data on the EHR dashboard adequately.
The role of machine learning in this operation parallels its role in data mining. However, in the case of data visualization, it is about interpreting data in an accessible form.
At the current moment, one of the most frequently used visualization libraries in EHR is d3. For example, we have used its sunburst and pie charts in the Nanostream project.
Healthcare is an industry that mostly operates with sensitive data through and through. Pretty much every element of healthcare operation, in one way or other, touches certain aspects of privacy and confidentiality.
The fact is that integrated systems like EHR are vulnerable to breaches, data loss, and other unfortunate things that may happen to data in the digital realm.
In addition to that, healthcare proceedings are bound by government regulations that detail the ins and outs of personal data gathering, processing, and storing in general, and specifically in the context of healthcare.
Such regulations as the European Union’s GDPR, Canada’s PIPEDA, and United States’ HIPAA, describe how to handle sensitive personal data and what the consequences are of its mishandling.
The implementation of EHR makes compliance with these regulations much more convenient as it allows us to automate much of the compliance workflow. Here’s how:
- Anonymization during data processing - in this case, patient data is prepared for testing, but non-crucial identifiable elements, such as names, are concealed.
- Access management - EHR structure allows limiting access to patient data only for those involved in a patient's treatment.
- A combination of encryption for data-at-rest and data-in-transit - the goal is to avoid any outside interference into data processing.
The adoption of electronic health records and the implementation of machine learning elevates healthcare operations to a new level.
On the one hand, it expands the view on patient data and puts it into the broader context of healthcare proceedings.
On the other hand, machine learning-fueled EHR provides doctors with a much more efficient and transparent framework for data science that results in more accurate data and deeper insights into it.
Estimate the project cost