litgpt Add phi-3 checkpoint

[ ] Verify Phi-3-mini-4k-instruct configs
[ ] Add prompt style
[ ] Add other config files
[ ] Add test_model.py
[ ] Add to test_prompts.py
[ ] Update 2 tables in README
[ ] Update download_model_weights.md

Apr 23 '24 15:04 rasbt

There is a modeling_*.py file. Good luck 🙂.

Apr 23 '24 15:04 Andrei-Aksionov

There is a modeling_*.py file. Good luck 🙂.

Haha, I finally get the weights loaded but of course it's never easy ... of course it's generating gibberish

⚡ phi-3-checkpoint ~/litgpt litgpt chat --checkpoint_dir checkpoints/microsoft/Phi-3-mini-4k-instruct
Now chatting with Phi-3-mini-4k-instruct.
To exit, press 'Enter' on an empty prompt.

Seed set to 1234
>> Prompt: What do llamas eat?
>> Reply: epsonniformes }).selves }).SSIONunicívo }). EverythingFormsћassaiejalphutureievediennesenticaciónicaciónMilMinigh ninassaselvesselves exhaustselvesonnselvesktionΗracheracheionedΗ Avenoted Bij_+versionsmastevosepsselvesmobileselvesilleryassaucealphasseestoreselvesférFormsiej Mu Kaiser oppienngnatteversionsionedionedversionsSSIONectionaccoossFormassaselves_+uminatesonoSSIONológissancecenteecause_+ienn选uraleʋ Stepalphigosionaliilonverte }).ienn }).ativo Sternsonoiejuralassawnkademselves│uraleativaionedvos_+utschversionsponiej_+icacióniejiewerológvoasonverte shoutioned位ionedIdentmobi

Let the easter egg hunt begin 😭

Apr 23 '24 20:04 rasbt

Some more tidbits via Daniel Han:

Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to @UnslothAI :

Sliding window of 2047? Mistral v1 4096. So does Phi mini have SWA? (And odd num?) Max RoPE position is 4096?
Upcasted RoPE? Like Gemma?
Dynamic RoPE for 128K context lengths
Fused MLP & QKV - need to unfuse
MMLU evals are very different betw the Phi team Llama-3 team - why?

Apr 24 '24 13:04 rasbt

Ok, it's becoming more interesting. Somewhat I expected from LlaMA 3, but it didn't deliver.

Apr 24 '24 14:04 Andrei-Aksionov

Looks like the sliding window number was a typo: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/commit/b043e05a86cfc77f8d53eb0edf6a33e39afbcb5e Screenshot 2024-04-24 at 12 21 02 PM

Apr 24 '24 17:04 rasbt

Current code is an ugly state, but at least the model produces the same output as HF one. The most notable change is that now Phi3 model doesn't use parallel_residual in contrast to Phi1.5 and Phi2.

The missing piece is the Tokenizer: it has a smaller vocab size (32k vs 50k) that was extended by 64 special tokens. If I'm not mistaken, the current code doesn't add these tokens.

Apr 25 '24 14:04 Andrei-Aksionov

The missing piece is the Tokenizer: it has a smaller vocab size (32k vs 50k) that was extended by 64 special tokens. If I'm not mistaken, the current code doesn't add these tokens.

Yeah, that sounds about right based on the Phi-3 paper:

To best benefit the open source community, phi-3-mini is built upon a similar block structure as Llama-2 [TLI+23] and uses the same tokenizer with vocabulary size of 320641

Apr 25 '24 16:04 rasbt

A related interesting post @Andrei-Aksionov https://x.com/danielhanchen/status/1795453604532207989

May 28 '24 16:05 rasbt

Alt Text

Required some number of changes but it works. Also tried a quick LoRA finetune — no issues there.

@rasbt Could you check the changes in READMEs? Not 100% sure that I've done them correctly.

Jun 28 '24 11:06 Andrei-Aksionov

Thanks so much! I am currently moving and offline until weekend/monday. Will take a look when I am back!

Jun 28 '24 11:06 rasbt

I think the failing tests are because of the new Eval Harness release: https://pypi.org/project/lm-eval/#history

I can look into it in a separate PR

Jul 01 '24 17:07 rasbt

Yep, this is the reason. I "love" when bug-fix releases break code.

Jul 01 '24 17:07 Andrei-Aksionov

All good now. Big thanks again @Andrei-Aksionov !!

Jul 01 '24 18:07 rasbt