AIF360 icon indicating copy to clipboard operation
AIF360 copied to clipboard

Using BinaryLabelDataset with pyspark dataframe

Open ilirosmanaj opened this issue 2 years ago • 0 comments

Hi,

I am already using BinaryLabelDataset for generating fairness metrics and it works rather fine with average size dataframes. Now, due to some preprocessing steps in one of my pipelines, I need much more memory and need to support large csv files (e.g. 10GB+) and switched to using pyspark.

My question is: does BinaryLabelDataset also work with pyspark dataframe or I need to convert pyspark dataframe it to pandas dataframe (and basically kind of loosing the distributed property of pyspark by doing this and still risking of memory overflow)?

Thanks in advance

ilirosmanaj avatar Apr 30 '22 21:04 ilirosmanaj