byt5 icon indicating copy to clipboard operation
byt5 copied to clipboard

Convert to bytes

Open NoviScl opened this issue 4 years ago • 1 comments

Hi,

May I check the exact script that you used to convert strings into UTF-8 bytes?

NoviScl avatar Sep 13 '21 00:09 NoviScl

Hi @NoviScl ,

really good question, after "some" searching I'm 100% sure that seqio is used. More precisely the ByteVocabulary implementation from here:

https://github.com/google/seqio/blob/3fd3175537540f8e0ce7579d9ae7936721adc05d/seqio/vocabularies.py#L349

This class is then later initialized here in the byt5 library:

https://github.com/google-research/byt5/blob/2f46814cbc22e2814db0fcdd48639ea7e3293c67/byt5/tasks.py#L44-L45

stefan-it avatar Sep 27 '21 14:09 stefan-it