spark-df-profiling icon indicating copy to clipboard operation
spark-df-profiling copied to clipboard

showing Incorrect Missing data in HTML Report

Open harika1419 opened this issue 5 years ago • 6 comments

After generating the HTML report using spark-df- profiling It is showing the percentage of Missing data as 0%.

Even though dataframe has some missing data

harika1419 avatar Jun 06 '19 06:06 harika1419

Could you give an example?

adutchengineer avatar Jul 31 '19 15:07 adutchengineer

Is this fixed yet? mine also shows wrong missing data as 0%

shhanani avatar Jan 07 '20 14:01 shhanani

@harika1419 I think I found the issue. It's in line 397. Change to this:

results_data = df.select(column).na.drop().agg(countDistinct(col(column)).alias("distinct_count"),
                                                       count(col(column)).alias("count")).toPandas()

@julioasotodv you might need to look at this solution

shhanani avatar Jan 08 '20 06:01 shhanani

Hi... That issue was fixed after upgrading the spark from 1.6 to 2.3.3

harika1419 avatar Jan 08 '20 11:01 harika1419

Hi @harika1419, Thanks for informing. I'm facing this issue while using spark 2.4.2, that is why I thought its not fixed yet.

shhanani avatar Jan 08 '20 13:01 shhanani

I'm on Spark 3.1.0, and it's showing wrong. Also the number of zeros are wrong.

Strauman avatar Jun 07 '21 14:06 Strauman