Data Cleaning Project in SQL
- Ayushi Gupta
- Aug 17, 2025
- 1 min read
Steps followed:
Create schema
Load the data into a table
Create a staging table to be used for analysis
Remove duplicates
Standardize the data
Remove any unnecessary rows/columns (decide which are the best to kep for further analysis)

Snippet of finding duplicates in a more advanced manner in MYSQL Workbench 8.0 
Data standardization 
Removing unnecessary data Takeaways:
1.Duplicate rows are sneaky, but patterns reveal them. (Even without a primary key, you can catch duplicates by thinking about what makes a record “unique.)
2. Consistency matters more than completeness. A simple trim or standardization (Crypto vs. Cryptocurrency, United States vs. United States.) can dramatically improve the quality of your analysis.
3. Missing values aren’t always a dead end. Instead of dropping rows with blanks, you can sometimes teach your dataset to fix itself.
#I used this Alex the Analyst's guide for this project: https://www.youtube.com/watch?v=4UltKCnnnTA

Comments