byt5
byt5 copied to clipboard
Convert to bytes
Hi,
May I check the exact script that you used to convert strings into UTF-8 bytes?
Hi @NoviScl ,
really good question, after "some" searching I'm 100% sure that seqio is used. More precisely the ByteVocabulary implementation from here:
https://github.com/google/seqio/blob/3fd3175537540f8e0ce7579d9ae7936721adc05d/seqio/vocabularies.py#L349
This class is then later initialized here in the byt5 library:
https://github.com/google-research/byt5/blob/2f46814cbc22e2814db0fcdd48639ea7e3293c67/byt5/tasks.py#L44-L45