cleanlab-studio
cleanlab-studio copied to clipboard
Changing upload process for lazy loaded DataFrames
In the original DataFrame upload implementation for Snowflake and PySpark DataFrames, the DataFrames are loaded entire into memory first before being uploaded (PySpark, Snowflake).
However, this can cause problems if the DataFrames are larger than the driver's memory (PySpark, Snowflake).
This PR processes the DataFrames by batch, which solves the memory problem.