litgpt Default top_k value (or something else) causes nonsensical outputs

Bug description

With the current default top_k setting (set to 50) we get nonsensical results most (but not all) of the time:

from litgpt import LLM

llm = LLM.load("mistralai/Mathstral-7B-v0.1")
print(llm.generate("What is 1+2?"))

#  In your job, what is the average time people spend using your app? 
# These can have wildly different answers depending upon the dataset, 
# the timeframe for the dataset, and how the data is being calculated or stated, disaggregated and re

print(llm.generate("What is 1+2?", top_k=1))
#  1+2 equals 3.

del llm
llm = LLM.load("microsoft/phi-2")

llm.generate("What do Llamas eat?")
# ' Curation Level:\n'

llm.generate("What do Llamas eat?")
# ' Peculiar\n'

llm.generate("What do Llamas eat?")
# ' Llamas eat grass, leaves, and shrubs. 
# They prefer to graze on coarse vegetation and can 
# consume as much as 30% of their body weight in a single day.\n'

llm.generate("What do Llamas eat?", top_k=1)
# ' Llamas are herbivores and primarily eat grass, hay, and other plant material. 
# They have a unique digestive system that allows them to efficiently extract 
# nutrients from fibrous plant material.\n'

However, don't remember it being that bad. It also seems to be only that bad in the Python API. I wonder if this was perhaps caused by the updated kv-cache settings via #1590

CC @Andrei-Aksionov

What operating system are you using?

Linux

LitGPT Version

0.4.5

Jul 18 '24 17:07 rasbt

Thank you for adding "(or something else)", this is the point I was trying to make in the PR.

Jul 18 '24 17:07 awaelchli

A simple test to rule out kv cache can be to run with and without it (since kv cache should only affect speed).

Jul 18 '24 17:07 awaelchli

@awaelchli Thanks! I am fairly certain now that it was an incomplete kv-cache clearing (#1596)

Jul 18 '24 18:07 rasbt

We addressed this in #1596

Jul 23 '24 15:07 rasbt