astro-sdk icon indicating copy to clipboard operation
astro-sdk copied to clipboard

load_file & save_filewith datasets larger than worker resources

Open dimberman opened this issue 3 years ago • 0 comments
trafficstars

The current system of loading is limited to the size of a single dataframe. This of course will not scale to full production use-cases.

Proposed solution:

  1. By default, we can use smart_open to chunk the input file, and create smaller dataframes to push into the database
  2. Given a cloud data storage system (e.g. BQ or snowflake) we can create specific solutions around how those systems optimally load data

Acceptance criteria:

  • Should be able to load 100GB of data into BQ, Snowflake, and redshift

dimberman avatar Feb 07 '22 15:02 dimberman