TinyLlama icon indicating copy to clipboard operation
TinyLlama copied to clipboard

The queries generated a lot of repetitions. Possible to provide 1T again "fix"?

Open hiqsociety opened this issue 1 year ago • 6 comments

Seems like you will be redoing it again. Was wondering if you can do for 0.5b and 1T again.

hiqsociety avatar Oct 20 '23 19:10 hiqsociety

They seem (according to the live charts) to have restarted at a earlier point in the training, but after the 1T checkpoint, so I don't think it would do ought. Also, repetition is just a LLM issue, at this point—especially at the small size, it's hardly shocking. I presume fine tuners will do their magic, in a couple of months when the final model is released.

qaziquza avatar Oct 22 '23 06:10 qaziquza

I dont think the The Data-loading bug affected the 0.b and 1t, thats more of a tiny model thing, along with bad settings.

increase rep penalty, along with temp, and mess with top_k top_p, should help

VatsaDev avatar Oct 23 '23 01:10 VatsaDev

@VatsaDev would you mind showing the exact settings u used to get quality results? example prompts and responses will be highly appreciative :)

hiqsociety avatar Oct 31 '23 14:10 hiqsociety

Well my main experience is with pythia-350m, gpt 2 medium, and GPT-2xl, and I dont have specific prompts to show off, but i do have a couple screenshots in my UI PR, #30, as you can see it has factual issues, its v1 chat, but It did manage to give me valid information on finnish airports, so it is def somewhat competent

Most of my settings are,

  • Nonsensical text, drop top_k, usually at 20
  • Repetitive text, increase temp, usually at 0.8, but I go to 1.2 when going for creativity, 0.3 for logic/reasoning/math
  • Not much usage of top_p, but 0.95 has worked great, really natural text, its 0 for greedy samples, which is good for factual

VatsaDev avatar Nov 07 '23 23:11 VatsaDev

Yeah, quality and repetition is a big issue on TinyLlama (say, versus DeepSeek 1.3B, a coding model, which should be worse - but is in fact better - on general knowledge questions like this):

deepseek-ai/deepseek-coder-1.3b-base:
Length of input is 29
Human: List the planets in our solar system. Respond only with the list of planets. 
Assistant: The planets are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune.
Human: ...

Trelis/TinyLlama-1.1B-4k-chat-SFT:
Length of input is 31
Human: List the planets in our solar system. Respond only with the list of planets. 
Assistant: The planets are Jupiter, Saturn, Mars, Venus, Mercury, Earth, and the Sun.
User: ...

Further, there's no way to stop TinyLlama from generating an EOS token (after SFT) without putting a very high repetition penalty, in which case the answer is poor, here's a repetition penalty of 5.0:

Trelis/TinyLlama-1.1B-4k-chat-SFT:
Length of input is 31
Human: List the planets in our solar system. Respond only with the list of planets. 
Assistant: The planets are Jupiter, Saturn and Uranus; Mercury is not a planet because it does NOT orbit around Earth as we do (it's too far away).

RonanKMcGovern avatar Nov 14 '23 10:11 RonanKMcGovern

@RonanKMcGovern I just tested out all TinyLlama's chat model (V0.1 to V0.6) and the model does not generate repetition. Not sure why it is the case for you? Below is a screenshot of V0.6: Screenshot 2023-11-20 at 7 29 08 PM

jzhang38 avatar Nov 20 '23 11:11 jzhang38