My Learnings
One of the most impactful lessons I learned during this project was the complexity of working with unstructured medical data. Building a Named-Entity Recognition (NER) model using SpaCy and Python taught me the importance of precision and scalability. I realized that handling real-world data requires more than just technical expertise; it requires an understanding of domain-specific nuances. The experience gave me a deeper appreciation for the delicate balance between data processing and interpretation, especially when the stakes involve medical information.
Another key takeaway was the value of building scalable and efficient systems. Designing the real-time ETL pipeline to aggregate data from Databricks to PostgreSQL reinforced how critical it is to ensure that data flows seamlessly across platforms, especially when pulling from multiple sources. This experience expanded my skills in managing real-time data pipelines and understanding the nuances of different data environments. Additionally, Dockerizing the model for multiple teams was a significant learning experience in making complex models accessible and reusable across diverse environments, ensuring the scalability and maintainability of solutions.