connector-x
connector-x copied to clipboard
transfer data from s3 (or gcs/adls)
Describe your feature request
I'd like to read data (parquet files) directly from s3.
Thank you for releasing this super helpful project - the introductory blog mentions a plan to support transferring data from s3 - is there any update on that?
Thanks,
Hi @99snowleopards , thank you for bringing up this. Currently, we are focusing on loading data from relational databases like this discussion, but we do think s3 will be an important data source. It will be helpful for us to decide its priority if you can share your current tool to load data from s3 and its issue here or in that discussion!
thank you for replying - I use pandas to read data into a df directly, or the aws CLI to cp the data and then read into pandas. the issue is that it's very slow
Have you tried using arrow? This is the fastest way I know to fetch dataframe from s3. You can convert arrow to pandas afterwards.
I use the pyarrow engine - I'll try using arrow separately and converting to pandas as per your suggestion, thanks again for replying