astro-sdk
astro-sdk copied to clipboard
load_file & save_filewith datasets larger than worker resources
trafficstars
The current system of loading is limited to the size of a single dataframe. This of course will not scale to full production use-cases.
Proposed solution:
- By default, we can use smart_open to chunk the input file, and create smaller dataframes to push into the database
- Given a cloud data storage system (e.g. BQ or snowflake) we can create specific solutions around how those systems optimally load data
Acceptance criteria:
- Should be able to load 100GB of data into BQ, Snowflake, and redshift