astro-sdk load_file & save_filewith datasets larger than worker resources

load_file & save_filewith datasets larger than worker resources

Open dimberman opened this issue 3 years ago • 0 comments

trafficstars

The current system of loading is limited to the size of a single dataframe. This of course will not scale to full production use-cases.

Proposed solution:

By default, we can use smart_open to chunk the input file, and create smaller dataframes to push into the database
Given a cloud data storage system (e.g. BQ or snowflake) we can create specific solutions around how those systems optimally load data

Acceptance criteria:

Feb 07 '22 15:02 dimberman