AIF360
AIF360 copied to clipboard
Using BinaryLabelDataset with pyspark dataframe
Hi,
I am already using BinaryLabelDataset for generating fairness metrics and it works rather fine with average size dataframes. Now, due to some preprocessing steps in one of my pipelines, I need much more memory and need to support large csv files (e.g. 10GB+) and switched to using pyspark.
My question is: does BinaryLabelDataset also work with pyspark dataframe or I need to convert pyspark dataframe it to pandas dataframe (and basically kind of loosing the distributed property of pyspark by doing this and still risking of memory overflow)?
Thanks in advance