data
data copied to clipboard
[DataPipe] key renamer
This PR adds a filter that allows keys to be renamed in training samples represented as dictionaries. This is particularly useful for webdataset-style data sets, but can also be used with other dictionary iterators.
Please switch the order of inputs 'pattern' -> 'new name' looks more natural
The usual usage is with keyword arguments using a simple key as output and a pattern as input. It also parallels assignment. I think this order is more useful. What do you think?
In my opinion it makes sense to have two datapipes:
pattern_filter_keys
-> takes patterns, throws away all missmatch keys #406
and
pattern_rename_keys
-> takes pattern->new_name dictionary and renames keys accordingly. In this case they will follow same API patterns and would be easy to remember.