Member-only story

Process Data from Dirty to Clean

Maung Agus Sutikno
4 min readNov 20, 2021

--

As part of Google Data Analytics Program in Coursera, Process Data from Dirty to Clean is the 4th part of the program that talks about testing, cleaning, and transforming the data. In addition, we can learn to apply SQL in Google Big Query with actual data. Last part of the course, it gives us a practical action in building our resume to apply a data analyst position.

Types of dirty data (Google Data Analytics)

What can we do, as an analyst, for tackling dirty data like displayed above?

In a multinational companies which work in a global economy, a data analyst often faces various data forms. Therefore, it is important to process data by checking the data integrity for example relating to date format. By doing so, business objectives can be achieved efficiently in term of data analysis process. Some of principle equation about data integrity and business objectives:

If we find the issue with our data for example data errors and not enough data, we can do following decision tree:

Sample in our data analysis must sufficient. Based on Central Limit Theorem (CLT) in the probability statistics, the minimum of sample is 30. Having said that…

--

--

No responses yet