Hi everyone! This is the first of our two-part blog series about data streaming in the healthcare application. Recent times show that digital healthcare is the most highly regulated sectors across the globe.
Data Analytics is of great significance to the healthcare industry and is very straight forward. Here, data is considered to be most valuable. It involves all aspects of healthcare research and implementing it into the reality.
In the first of our two-part blog series we are focusing on the challenges faced while streaming healthcare data for analytics.
We can evidently visualize the rapid growth of the health data sets and the increasing graph line is very prominent. Since the evolution of Artificial Intelligence/ Machine Learning, it can do wonders in Healthcare if we have the clean & right set of data.
Data analytics on healthcare data is very straigt forward. We cannot make use of the raw data directly from the database. We have to think in the perspective of complaince and other security aspects, since it has got more of sensitive information. We need to filter out all patient Personally Identifiable Information (PII) in the data before running any analytics on it.
The analytics really helps people in the healthcare sector to enhance the patient experience, spot the prevailing & circling trends and improvise the quality of the overall patient care.
How it started,
Once we got a requirement from one of our clients asking to stream the application data (patient information) from the Production Database into another database where the Data Scientist can perform their analytics.
As said earlier, we couldn’t make use of the data directly from the database. We went through trouble as always :P. Let me explain it to you,
the Challenges faced,
- Filter out ePHI:
- When you work with healthcare applications, the data will contain both PHI & non-PHI data. When you Stream data into the analytics database you need to make sure you filter out all the PHI data.
- Data Stream:
- Our application is growing rapidly and we are getting data from multiple sources. Whenever the data falls into our production database it has to be streamed to the analytics database, but with one condition of not to provide any interception to our regular Users.
- Data Transform
- The table structure we had in the application is not that easy for making analytics. The same table requires a transformation into different structures that are suitable for the analytics database.
- So we build a middleware that will transfer our data into the structure which is suited to the analytics database.
First challenge for us was to filter out the PHI data while it is being oushed from our healthcare application to analytics application. To over come this challenge, we built our own Ruby Gem.
Once the gem installed, it has to be included into the models/ tables that needs to be pushed. Here, we might not require all the fileds in the table from our database. So, the columns are defined (that needs to be pushed in the table) and sent to the next process.
Whenever any changes happen in the database (in the tables), Cross Stream will collect the columns and put into an end point. We can add different end point or multiple end points to the data.
Currently the gem built is pushing the data into Aws Kinesis, a service that makes it easy to collect, process, and analyze real-time, streaming data.
cross_stream :id, :column_1, :column_2
With here, we come to the end of the first part. I will be explaining about the futher steps that happens after pushing the data into AWS Kinesis.
Want to know how we over come other two challenges & built a real time data stream for a healthcare application?
Then wait for part ✌🏻, coming soon 🙂