nanoGPT
nanoGPT copied to clipboard
How to change vocabulary size/number of tokens when tokenizing openwebtext?
Hi,
Is there an elegant way to change the size of the vocabulary for openwebtext dataset?