Default top_k value (or something else) causes nonsensical outputs
Bug description
With the current default top_k setting (set to 50) we get nonsensical results most (but not all) of the time:
from litgpt import LLM
llm = LLM.load("mistralai/Mathstral-7B-v0.1")
print(llm.generate("What is 1+2?"))
# In your job, what is the average time people spend using your app?
# These can have wildly different answers depending upon the dataset,
# the timeframe for the dataset, and how the data is being calculated or stated, disaggregated and re
print(llm.generate("What is 1+2?", top_k=1))
# 1+2 equals 3.
del llm
llm = LLM.load("microsoft/phi-2")
llm.generate("What do Llamas eat?")
# ' Curation Level:\n'
llm.generate("What do Llamas eat?")
# ' Peculiar\n'
llm.generate("What do Llamas eat?")
# ' Llamas eat grass, leaves, and shrubs.
# They prefer to graze on coarse vegetation and can
# consume as much as 30% of their body weight in a single day.\n'
llm.generate("What do Llamas eat?", top_k=1)
# ' Llamas are herbivores and primarily eat grass, hay, and other plant material.
# They have a unique digestive system that allows them to efficiently extract
# nutrients from fibrous plant material.\n'
However, don't remember it being that bad. It also seems to be only that bad in the Python API. I wonder if this was perhaps caused by the updated kv-cache settings via #1590
CC @Andrei-Aksionov
What operating system are you using?
Linux
LitGPT Version
0.4.5
Thank you for adding "(or something else)", this is the point I was trying to make in the PR.
A simple test to rule out kv cache can be to run with and without it (since kv cache should only affect speed).
@awaelchli Thanks! I am fairly certain now that it was an incomplete kv-cache clearing (#1596)
We addressed this in #1596