aws-sdk-pandas
aws-sdk-pandas copied to clipboard
change the name of "mode" argument to awswrangler.s3.to_csv
Describe the bug
The method (awswrangler.s3.to_csv) supports a "mode" argument and **pandas_kwargs. The "mode" argument is not passed through to Pandas, but consumed in the awswrangler method, which also expects dataset=True to use "mode". In some cases, it would be useful to pass this argument through to Pandas.
If there is already a way to pass "mode" to Pandas, a documentation update would resolve this issue:
How to Reproduce
import awswrangler as wr
...
load a DataFrame and name it df
...
Pandas "mode"
wr.s3.to_csv(df, "some_test_file_name", mode="a", header=False)
awswrangler expects mode="append", dataset=True
Expected behavior
No response
Your project
No response
Screenshots
No response
OS
AWS Lambda x86_64 Architecture
Python version
3.10
AWS SDK for pandas version
3.2.0
Additional context
No response
To avoid a breaking change we can consider introducing a one-off parameter (pandas_mode for example) that is re-labeled as mode and passed to the underlying pandas method. I can open a PR and the team can discuss if this is how we'd like to move forward.
That makes sense and sounds great -- thanks for the prompt response.
Still investigating here as this may require refactoring on how we are reading existing S3 File objects in order to support modes like append(a)
This may not be necessary at all if there is a way to append an existing file. For example, this seems to silently fail (doesn't append the dataframe "df" to the s3 file "target" and does not raise an exception):
wr.s3.to_csv(df, target, dataset=True, mode='append')
Is there a different way to append an existing file? We use pandas "append" mode primarily with pandas "chunksize" for large text files. If there's a different way to do this using awswrangler, "chunksize" and "mode" for pandas are not necessary. We're looking for a way to append to an existing file rather than writing a new file in the target directory.