Tabular Data Cleaning: COVID-19

Hands-on workshop focused on preprocessing, handling missing values, and analyzing real-world COVID-19 tabular datasets.

This hands-on workshop dives into one of the most critical and time-consuming steps in any Data Science pipeline: Data Cleaning and Preprocessing. Using real-world, messy tabular data from the COVID-19 pandemic, we explore practical techniques to prepare datasets for robust machine learning models.

Workshop Instructor: Zahra Amini

The complete repository includes Jupyter Notebooks demonstrating exploratory data analysis (EDA), handling missing values, and feature engineering on clinical and epidemiological data.

🚀 ACCESS WORKSHOP MATERIALS ON GITHUB


Workshop Highlights

Working with real-world healthcare data presents unique challenges. This workshop curriculum covers:

  • Exploratory Data Analysis (EDA): Uncovering hidden patterns, epidemiological trends, and anomalies within the COVID-19 dataset.
  • Handling Missing & Messy Data: Professional strategies for imputation, dropping invalid records, and standardizing inconsistent formats.
  • Feature Engineering: Creating meaningful predictive features from raw dates, categorical variables, and numerical columns to improve downstream model performance.
  • Outlier Detection: Identifying and managing statistical outliers that can skew predictive healthcare models.

Technologies used: Python, Pandas, NumPy, Matplotlib, Seaborn.