nostalgebraist
nostalgebraist
Thanks for the reply! I basically agree with everything you say here. I am using GPT-2 with the maximum text length (1024 tokens), so there is little room to improve...
Cool, thanks! Could you give me edit permissions for the branch on your fork ([see here](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork))? As you anticipated in your LW message, I'd like to edit your code a...
The CUDA error happens when the tokenizer produces indices that are too big for the text embedding. This fixes it: ```python VOCAB_SIZE = tokenizer.vocab_size ```