Sebastian Raschka
Sebastian Raschka
> There is a modeling_*.py file. > Good luck 🙂. Haha, I finally get the weights loaded but of course it's never easy ... of course it's generating gibberish ```...
Some more tidbits via [Daniel Han](https://twitter.com/danielhanchen/status/1782853167572832650): > Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to...
Looks like the sliding window number was a typo: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/commit/b043e05a86cfc77f8d53eb0edf6a33e39afbcb5e
> The missing piece is the Tokenizer: it has a smaller vocab size (32k vs 50k) that was extended by 64 special tokens. If I'm not mistaken, the current code...
A related interesting post @Andrei-Aksionov https://x.com/danielhanchen/status/1795453604532207989
Thanks so much! I am currently moving and offline until weekend/monday. Will take a look when I am back!
I think the failing tests are because of the new Eval Harness release: https://pypi.org/project/lm-eval/#history I can look into it in a separate PR
All good now. Big thanks again @Andrei-Aksionov !!
Converted this to an issue to address this in the future. Will need some focus time with our web devs to tackle that.
Awesome, thanks for jumping in here. Would love to get some insights wrt to how to improve that. I should mentioned, I used CUDA 11.8. Let me try the sample...