DanielHesslow

Results 3 comments of DanielHesslow

I'm not particularly familiar with the huggingface code base, and I do not currently have the time to read up one the specifics. The format used during training is: ```...

`tokenizer = AutoTokenizer.from_pretrained("lightonai/RITA_s")` Is indeed the correct tokenizer. the vocab size is 26.

This remapping is unfortunately not correct for all tokenizers, and there isn't actually a single mapping. Doing it correctly requires treating each internal decoder separately. It's very possible but it...