DanielHesslow
DanielHesslow
I'm not particularly familiar with the huggingface code base, and I do not currently have the time to read up one the specifics. The format used during training is: ```...
`tokenizer = AutoTokenizer.from_pretrained("lightonai/RITA_s")` Is indeed the correct tokenizer. the vocab size is 26.
This remapping is unfortunately not correct for all tokenizers, and there isn't actually a single mapping. Doing it correctly requires treating each internal decoder separately. It's very possible but it...