juliasilge.com icon indicating copy to clipboard operation
juliasilge.com copied to clipboard

Predict #TidyTuesday NYT bestsellers | Julia Silge

Open utterances-bot opened this issue 2 years ago • 2 comments

Predict #TidyTuesday NYT bestsellers | Julia Silge

A data science blog

https://juliasilge.com/blog/nyt-bestsellers/

utterances-bot avatar Jun 30 '22 02:06 utterances-bot

Hello Julia,

I am working my way through your SMLTAR book, I have a question about max_tokens. How do you decide what is an appropriate number to use in the model? In some of your other videos, you've gone as high as 1000 and as low as 100. In a real world problem, what are some of the best tips to picking the correct number of tokens?

Thanks!

gunnergalactico avatar Jun 30 '22 02:06 gunnergalactico

@gunnergalactico For something like "regular" natural language, I start on the higher side (in the thousands) because the vocabulary is larger. For some of the examples I work through that have very constrained vocabularies, like this example of names, going with a smaller number of tokens is better. Overall, though, it's good to realize that the number of tokens is really a hyperparameter of the model and you can tune it.

juliasilge avatar Jun 30 '22 17:06 juliasilge