bad-data-guide icon indicating copy to clipboard operation
bad-data-guide copied to clipboard

An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.

Results 12 bad-data-guide issues
Sort by recently updated
recently updated
newest added

Via Chris Wright: "I wanted to add that the date 1960-01-01 can also be a suspicious date as this is the 0 date for a data manipulation program called SAS....

In addition to China and Pakistan, [Rwanda](https://www.inclusivesecurity.org/how-women-rebuilt-rwanda/) also mandates that its parliament be 30% women, though it tends to do much better than this figure.

Listing two more sources of potential confusion when working with international data.

Excel has the "feature" that anything remotely looking like a date is forcefully converted to a date. It usually is not possible to figure out which string was originally input....

Arn't they actually December 31st 1899 or something similar slightly off. Due to leap year bug or something.

https://source.opennews.org/en-US/learning/distrust-your-data/

Via Chris Wright: "Finally record counts in general should be checked when receiving and loading data. Excel gives people the option to add in new lines within cells, this is...

Via Chris Wright: "Also, I'm always suspicious of row counts that are multiples of 1000 or 100 as these are often sample data sets and therefore missing records. I generally...

Excel can interpret a text symbol as scientific notation. For example, if something had the serial code 100E10 this will be converted to 1.0xe12

Excel import will convert text that looks somewhat like a date to a date; this has been noted as a common problem with human genetic data, as several gene symbols...