Tabular Data Cleaning: COVID-19
Hands-on workshop focused on preprocessing, handling missing values, and analyzing real-world COVID-19 tabular datasets.
This hands-on workshop dives into one of the most critical and time-consuming steps in any Data Science pipeline: Data Cleaning and Preprocessing. Using real-world, messy tabular data from the COVID-19 pandemic, we explore practical techniques to prepare datasets for robust machine learning models.
The complete repository includes Jupyter Notebooks demonstrating exploratory data analysis (EDA), handling missing values, and feature engineering on clinical and epidemiological data.
🚀 ACCESS WORKSHOP MATERIALS ON GITHUB
Workshop Highlights
Working with real-world healthcare data presents unique challenges. This workshop curriculum covers:
- Exploratory Data Analysis (EDA): Uncovering hidden patterns, epidemiological trends, and anomalies within the COVID-19 dataset.
- Handling Missing & Messy Data: Professional strategies for imputation, dropping invalid records, and standardizing inconsistent formats.
- Feature Engineering: Creating meaningful predictive features from raw dates, categorical variables, and numerical columns to improve downstream model performance.
- Outlier Detection: Identifying and managing statistical outliers that can skew predictive healthcare models.
Technologies used: Python, Pandas, NumPy, Matplotlib, Seaborn.