Centene

Centene Corporation is a leading healthcare enterprise committed to helping people live healthier lives. Centene offers affordable and high-quality products to more than 1 in 15 individuals across the nation, including Medicaid and Medicare members.
Role
Machine Learning Intern
Duration
June 2023 - September 2023
Team
Machine Learning Intern

Centene was looking for a Machine Learning Intern to develop and maintain machine learning models to improve the efficiency of their healthcare services.

Deliverables
  • 1. Designed a Named-Entity Recognition (NER) model to extract medical information using SpaCy and Python from unstructured medical text, resulting in savings worth $25,000,000.
  • 2. Built a real-time ETL pipeline for aggregating unstructured data from Databricks to Postgre from over 10 sources.
  • 3. Dockerized the model, making it available to 8 different internal teams.

Reflection & Takeaways

My Learnings

One of the most impactful lessons I learned during this project was the complexity of working with unstructured medical data. Building a Named-Entity Recognition (NER) model using SpaCy and Python taught me the importance of precision and scalability. I realized that handling real-world data requires more than just technical expertise; it requires an understanding of domain-specific nuances. The experience gave me a deeper appreciation for the delicate balance between data processing and interpretation, especially when the stakes involve medical information.

Another key takeaway was the value of building scalable and efficient systems. Designing the real-time ETL pipeline to aggregate data from Databricks to PostgreSQL reinforced how critical it is to ensure that data flows seamlessly across platforms, especially when pulling from multiple sources. This experience expanded my skills in managing real-time data pipelines and understanding the nuances of different data environments. Additionally, Dockerizing the model for multiple teams was a significant learning experience in making complex models accessible and reusable across diverse environments, ensuring the scalability and maintainability of solutions.

Technologies Used

Database
ReactPostgreSQL
Machine Learning
ReactPython
ReactSpaCy
ReactDatabricks
Data Processing
ReactETL
ReactDocker