Felix Schneider
Felix Schneider
There are several issues with the documentation and the `.pyi` stub files: - The documentation does not mention decoders at all. - In the stub file for `Tokenizer`, all of...
It would be good if there was a `processors.Sequence`, similar to `pre_tokenizers.Sequence`. Right now, if I want to make a Byte-level BPE tokenizer similar to Roberta, but with a different...
In some cases, some information in the text should be redacted in the review copy but present in the final copy. It would be nice to have to have a...
### Describe the bug Using `Dataset.map(fn, batched=True)` allows resizing the dataset by returning a dict of lists, all of which must be the same size. If they are not the...
**Is your feature request related to a problem? Please describe.** Using batch mapping, we can easily split examples. However, we lack an appropriate option for merging them back together by...