nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

How to change vocabulary size/number of tokens when tokenizing openwebtext?

Open hhroberthdaniel opened this issue 2 years ago • 0 comments

Hi,

Is there an elegant way to change the size of the vocabulary for openwebtext dataset?

hhroberthdaniel avatar Feb 22 '23 14:02 hhroberthdaniel