Extract, Transform, Load
There are over 500,000 injuries each year caused by exercise equipment.

Predict Injuries before they happen.
This project automates ETL for exercise data, uses KNN for safety scores, and generates recommendations and visuals to boost retention.
Project Purpose and Key Features
This project is focused on creating a safer and more engaging gym environment. We want to reduce member injuries by giving gym-goers easy access to essential information on proper exercise techniques. This information is presented as clear, data-driven visualizations for every exercise, organized by its difficulty level and the muscle group it targets.
The main purpose of doing this is two-fold. First, it directly enhances safety and improves the overall client satisfaction with the gym. Second, by making the gym a safer place to train, client loyalty and retention naturally increases. Ultimately, this system encourages collaboration and shared learning among all gym members. This is achieved through a variety of features:
- Automated ETL for gym datasets
- KNN safety score prediction
- GridSearchCV optimization
- Difficulty-based recommendations
- Stats summaries
- Cluster visuals
- Full logging for transparency

The ETL Pipeline
Datasets
The data for this project was sourced from Kaggle. As long as the required dependencies are installed you do not need to manually download the dataset(s).
Most of the data is from megaGymDataset by Niharika Pandit. It has ~3000 exercises total with additional information that was used for processing.
ETL Process



Leave a Reply