llm
llm copied to clipboard
30B model doesn't load
Following the same steps works for 7B and 13B model, with the 30B parameters I get
thread 'main' panicked at 'Could not load model: Tensor tok_embeddings.weight has the wrong size in model file', llama-rs/src/main.rs:39:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Yes, I am experiencing the same
Does the 30B model work for you in llama.cpp?
Yes, it works as expected in llama.cpp
On Wed, Mar 15, 2023 at 6:40 PM Philpax @.***> wrote:
Does the 30B model work for you in llama.cpp?
— Reply to this email directly, view it on GitHub https://github.com/setzer22/llama-rs/issues/13#issuecomment-1471149329, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA4HVFIDA2F2P3DBF3SUX3W4JVSPANCNFSM6AAAAAAV4CJ5VQ . You are receiving this because you commented.Message ID: @.***>
This could be a discrepancy in size due to integer promotion rules / a potential overflow, since the sizes for 30B are gonna be larger. More liberal use of usize (#18) would probably help here.
Need to see if I can repro this.
I have the same error, happens on https://github.com/setzer22/llama-rs/blob/3ce15b1200a3419d31c2dbe44b4ebd569370409a/llama-rs/src/llama.rs#L446 with tensor.nelements() = 212992000, n_parts = 3, and nelementes = 53248000
llama.cpp returns the same values with n_parts = 4 instead so it's not an i32 issue, https://github.com/setzer22/llama-rs/blob/3ce15b1200a3419d31c2dbe44b4ebd569370409a/llama-rs/src/llama.rs#L102 should say 4 instead of 3 (4 is what llama.cpp has as well).
Making this changes makes the model work for me. (Aside: The load time for the 30B model is brutal, if it can be parallel then it's absolutely worth it)
oh lol, good spot, you're correct that it's 4 in llama.cpp: https://github.com/ggerganov/llama.cpp/blob/721311070e31464ac12bef9a4444093eb3eaebf7/main.cpp#L34
@setzer22 can you do a quick change to main to fix that?
Pushed! Sorry about that :sweat_smile: It was just a typo on my end.
I was just able to load 30B with the changes on main, but I'll wait for others to confirm before closing the issue.
@setzer22 Working on my machine with main, also the alpaca fine tuned model floating around works with the project :smile:.
I confirm I can now load the 30B model with main.
But it fits barely if you have 64GB of ram
@RCasatta you mean the f16 version? Yes, I wasn't able to load that one on my machine (32GB). But I'm able to load the quantized one just fine.
Anyway, closing since the issue is solved, but feel free to keep discussing 😄
@RCasatta you mean the f16 version?
Yes I meant the f16 version, I didn't know you can quantize 😅