mlx-examples icon indicating copy to clipboard operation
mlx-examples copied to clipboard

mlx_lm with llama-3.3-70b-instruct works like base model in some case.

Open chigkim opened this issue 1 year ago • 3 comments

My prompt looks like this:

Provide a summary as well as a detail analysis of the following:
Then content to summarize goes next.

However, if I run the following,

mlx_lm.generate --model mlx-community/Llama-3.3-70B-Instruct-4bit --max-kv-size 30000 --max-tokens 2000 --temp 0.0 --top-p 0.9 --seed 1000 --system 'You are a helpful assistant' --prompt -<./28000.txt

I only get this:

"I hope this information has been helpful. If you have any further questions or need more information, please don't hesitate to ask."

I'm attaching the full prompt below.

28000.txt

Thanks!

chigkim avatar Dec 15 '24 20:12 chigkim

That's odd. Does it still fail if you don't specify --max-kv-size?

Is it just for that prompt or do you observe the same for shorter prompts? What about other Llama models or just the 70B?

awni avatar Dec 17 '24 17:12 awni

I discovered this when I created a script to test speed with various prompts lengths.

What's interesting is that when feeding 28k, 30k, 32k, it has the same problem where it only generates 27 tokens with the same phrase. When feeding Prompts with 26k tokens and less, it didn't have the problem.

I'm suspecting something might be going with long context? It's like opposite of the issues I created for looping problem with long context and llama-3.1-8b-instruct-4bit.

I'll test some more with what you suggested, and report back.

chigkim avatar Dec 17 '24 23:12 chigkim

I mentioned in the other thread, but there was a bug with these Llama models causing duplicate BOS tokens that is now fixed and I wonder if that was impacting the results you see here?

awni avatar Jan 03 '25 23:01 awni