tfx icon indicating copy to clipboard operation
tfx copied to clipboard

Add support for other delimiters in CsvExampleGen

Open pavanky opened this issue 4 years ago • 3 comments

System information

  • TFX Version (you are using): 1.0

  • Environment in which you plan to use the feature (e.g., Local (Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc..): Google Cloud

  • Are you willing to contribute it (Yes/No): Yes (Not me personally but someone from Twitter)

Describe the feature and the current behavior/state.

CsvExampleGen has , hardcoded as the delimiter

Will this change the current API? How?

Yes. Potentially adds new paramters to CsvEXampleGen

Who will benefit with this feature?

Generalizes CsvExampleGen so more people can start using the component.

Do you have a workaround or are completely blocked by this? :

Can have custom example gen that works for us internally

Name of your Organization (Optional) Twitter

Any Other info.

pavanky avatar Aug 02 '21 20:08 pavanky

+1

There's a TODO to consider allowing users to configure parsing parameters, and the function used already allows passing a delimiter, so making this change is mostly a matter of surfacing the delimiter as a parameter in the component as an arg (or through a custom_config if supporting additional parsing parameters) and updating the function call.

codesue avatar Aug 02 '21 21:08 codesue

@pavanky

Could you please confirm if this issue can be closed.Thanks

UsharaniPagadala avatar Oct 06 '21 13:10 UsharaniPagadala

@UsharaniPagadala

This issue isn't resolved yet. The delimiter is still hard-coded: https://github.com/tensorflow/tfx/blob/master/tfx/components/example_gen/csv_example_gen/executor.py#L199

codesue avatar Oct 06 '21 15:10 codesue

@pavanky,

Are you still looking for a resolution? We are planning on prioritising the issues based on the community interests. Please let us know if this issue still persists with the latest TFX 1.13 release so that we can work on fixing it. Currently, CsvExampleGen supports , delimiter and for other delimiters, it is recommended to create a custom ExampleGen component. Thank you for your contributions.

singhniraj08 avatar May 31 '23 07:05 singhniraj08

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar Jun 08 '23 02:06 github-actions[bot]

This issue was closed due to lack of activity after being marked stale for past 7 days.

github-actions[bot] avatar Jun 16 '23 02:06 github-actions[bot]