datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Add `@transmit_format` in `flatten`

Open alvarobartt opened this issue 3 years ago • 3 comments

As suggested by @mariosasko in https://github.com/huggingface/datasets/pull/4411, we should include the @transmit_format decorator to flatten, rename_column, and rename_columns so as to ensure that the value of _format_columns in an ArrowDataset is properly updated.

Edit: according to @mariosasko comment below, the decorator @transmit_format doesn't handle column renaming, so it's done manually for those instead.

alvarobartt avatar Jun 14 '22 20:06 alvarobartt

@mariosasko please let me know whether we need to include some sort of tests to make sure that the decorator is working as expected. Thanks! 🤗

alvarobartt avatar Jun 14 '22 20:06 alvarobartt

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Hi, thanks for working on this! Yes, please add (simple) tests so we can avoid any unexpected behavior in the future.

@transmit_format doesn't handle column renaming, so I removed it from rename_column and rename_columns and added a comment to explain this.

mariosasko avatar Jun 15 '22 16:06 mariosasko

Oops, I thought this PR was already merged and deleted from the source repository, I'll be creating a new branch out of main so as to re-create this PR... My bad :weary:

alvarobartt avatar Sep 27 '22 11:09 alvarobartt