inference icon indicating copy to clipboard operation
inference copied to clipboard

[405b-SUT] Max number of output tokens

Open attafosu opened this issue 11 months ago • 1 comments

For 405B the sampling parameter config sets the max output tokens to be 20k. However, given the reference output distribution with max output length of 1.7k, I don't think we should set this parameter in the sampler that high. @nvzhihanj @arjunsuresh @mrmhodak

attafosu avatar Jan 28 '25 22:01 attafosu

max_new_tokens should be 2000 (max input length is 20000), this looks like a typo. Can you help submit a PR to patch it?

nvzhihanj avatar Jan 28 '25 22:01 nvzhihanj