top of page
Search

Data Cleaning Project in SQL

  • Writer: Ayushi Gupta
    Ayushi Gupta
  • Aug 17, 2025
  • 1 min read

Steps followed:

  1. Create schema

  2. Load the data into a table

  3. Create a staging table to be used for analysis

  4. Remove duplicates

  5. Standardize the data

  6. Remove any unnecessary rows/columns (decide which are the best to kep for further analysis)

    Snippet of finding duplicates in a more advanced manner in MYSQL Workbench 8.0
    Snippet of finding duplicates in a more advanced manner in MYSQL Workbench 8.0


    Data standardization
    Data standardization

    Removing unnecessary data
    Removing unnecessary data
  7. Takeaways:

    1.Duplicate rows are sneaky, but patterns reveal them. (Even without a primary key, you can catch duplicates by thinking about what makes a record “unique.)

    2. Consistency matters more than completeness. A simple trim or standardization (Crypto vs. Cryptocurrency, United States vs. United States.) can dramatically improve the quality of your analysis.

    3. Missing values aren’t always a dead end. Instead of dropping rows with blanks, you can sometimes teach your dataset to fix itself.



#I used this Alex the Analyst's guide for this project: https://www.youtube.com/watch?v=4UltKCnnnTA

 
 
 

Recent Posts

See All
Navigating Career Fairs

I attended the National Black MBA Association’s career fairs twice—once in Philadelphia (2023) and again in Washington, DC (2024) as part...

 
 
 
Hello to my MBA Degree!

Hey there! I am proud to say that a few weeks ago I graduated from the William and May's MBA program. As any of my colleagues will tell...

 
 
 

Comments


Contact Me

Feel free to reach out for any inquiries or opportunities.

I am open to discussing how my skills and experience can contribute to your projects.



603-854-9824

Austin, Texas

  • LinkedIn
bottom of page