cleanlab-studio icon indicating copy to clipboard operation
cleanlab-studio copied to clipboard

Changing upload process for lazy loaded DataFrames

Open AnimatorJoe opened this issue 5 months ago • 0 comments

In the original DataFrame upload implementation for Snowflake and PySpark DataFrames, the DataFrames are loaded entire into memory first before being uploaded (PySpark, Snowflake).

However, this can cause problems if the DataFrames are larger than the driver's memory (PySpark, Snowflake).

This PR processes the DataFrames by batch, which solves the memory problem.

AnimatorJoe avatar Jan 09 '24 19:01 AnimatorJoe