Tabular Data Cleaning: COVID-19

This hands-on workshop dives into one of the most critical and time-consuming steps in any Data Science pipeline: Data Cleaning and Preprocessing. Using real-world, messy tabular data from the COVID-19 pandemic, we explore practical techniques to prepare datasets for robust machine learning models.

Workshop Instructor: Zahra Amini

The complete repository includes Jupyter Notebooks demonstrating exploratory data analysis (EDA), handling missing values, and feature engineering on clinical and epidemiological data.

🚀 ACCESS WORKSHOP MATERIALS ON GITHUB

Workshop Highlights

Working with real-world healthcare data presents unique challenges. This workshop curriculum covers:

Exploratory Data Analysis (EDA): Uncovering hidden patterns, epidemiological trends, and anomalies within the COVID-19 dataset.
Handling Missing & Messy Data: Professional strategies for imputation, dropping invalid records, and standardizing inconsistent formats.
Feature Engineering: Creating meaningful predictive features from raw dates, categorical variables, and numerical columns to improve downstream model performance.
Outlier Detection: Identifying and managing statistical outliers that can skew predictive healthcare models.

Technologies used: Python, Pandas, NumPy, Matplotlib, Seaborn.