inference
inference copied to clipboard
[405b-SUT] Max number of output tokens
For 405B the sampling parameter config sets the max output tokens to be 20k. However, given the reference output distribution with max output length of 1.7k, I don't think we should set this parameter in the sampler that high. @nvzhihanj @arjunsuresh @mrmhodak
max_new_tokens should be 2000 (max input length is 20000), this looks like a typo. Can you help submit a PR to patch it?