change the name of "mode" argument to awswrangler.s3.to_csv

Open goleash-4alight opened this issue 2 years ago • 5 comments

Describe the bug

The method (awswrangler.s3.to_csv) supports a "mode" argument and **pandas_kwargs. The "mode" argument is not passed through to Pandas, but consumed in the awswrangler method, which also expects dataset=True to use "mode". In some cases, it would be useful to pass this argument through to Pandas.

If there is already a way to pass "mode" to Pandas, a documentation update would resolve this issue:

How to Reproduce

import awswrangler as wr

...

load a DataFrame and name it df

...

Pandas "mode"

wr.s3.to_csv(df, "some_test_file_name", mode="a", header=False)

awswrangler expects mode="append", dataset=True

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

AWS Lambda x86_64 Architecture

Python version

3.10

AWS SDK for pandas version

3.2.0

Additional context

No response

Jul 25 '23 21:07 goleash-4alight

To avoid a breaking change we can consider introducing a one-off parameter (pandas_mode for example) that is re-labeled as mode and passed to the underlying pandas method. I can open a PR and the team can discuss if this is how we'd like to move forward.

Jul 26 '23 13:07 malachi-constant

That makes sense and sounds great -- thanks for the prompt response.

Jul 26 '23 13:07 goleash-4alight

Still investigating here as this may require refactoring on how we are reading existing S3 File objects in order to support modes like append(a)

Jul 26 '23 17:07 malachi-constant

This may not be necessary at all if there is a way to append an existing file. For example, this seems to silently fail (doesn't append the dataframe "df" to the s3 file "target" and does not raise an exception):

wr.s3.to_csv(df, target, dataset=True, mode='append')

Jul 31 '23 20:07 goleash-4alight

Is there a different way to append an existing file? We use pandas "append" mode primarily with pandas "chunksize" for large text files. If there's a different way to do this using awswrangler, "chunksize" and "mode" for pandas are not necessary. We're looking for a way to append to an existing file rather than writing a new file in the target directory.

Oct 04 '23 14:10 ggoleash-4-ats

aws-sdk-pandas aws-sdk-pandas copied to clipboard

change the name of "mode" argument to awswrangler.s3.to_csv

Describe the bug

How to Reproduce

load a DataFrame and name it df

Pandas "mode"

awswrangler expects mode="append", dataset=True

Expected behavior

Your project

Screenshots

OS

Python version

AWS SDK for pandas version

Additional context

aws-sdk-pandas
aws-sdk-pandas copied to clipboard