ai-hub-models [BUG] genie-t2t-run Fails to run llama v2 7B quantized on Galaxy S23 Ultra

[BUG] genie-t2t-run Fails to run llama v2 7B quantized on Galaxy S23 Ultra

Open taeyeonlee opened this issue 4 months ago • 1 comments

When running llama v2 7B quantized on QNN HTP backend of Snapdragon-Gen2, the error is following.

What does it mean ? "Could not create context from binary for context index = 0 : err 1009"

dm3q:/ $ export LD_LIBRARY_PATH=/data/local/tmp   
dm3q:/data/local/tmp $ ./genie-t2t-run -c htp-model-config-llama2-7b.json -p "<<SYS>>\nYou are a helpful AI assistant.<</SYS>>\n\n[INST] have we been to Mars? [/INST]"
Using libGenie.so version 1.0.0

[WARN]  "Unable to initialize logging in backend extensions."
[INFO]  "Using create From Binary"
[INFO]  "Allocated total size = 300255744 across 8 buffers"
[ERROR] "Could not create context from binary for context index = 0 : err 1009"
[ERROR] "Create From Binary FAILED!"
Failure to initialize model
ERROR at line 234: Failed to create the dialog.
1|dm3q:/data/local/tmp $

The Bin files (llama2_0.serialized.bin, llama2_1.serialized.bin, llama2_2.serialized.bin, llama2_3.serialized.bin) were generated with --target-gen snapdragon-gen2, as like below python gen_ondevice_llama.py --hub-model-id m1q8lpygn,mrmdjx4km,mkngj646n,mknjj0gxn,meq2dy80m,mzmx5gykn,m6qejgw7m,mwn0p5d8m --output-dir ./export --tokenizer-zip-path ./tokenizer.zip --target-gen snapdragon-gen2 --target-os android

Oct 06 '24 05:10 taeyeonlee

ai-hub-models ai-hub-models copied to clipboard

[BUG] genie-t2t-run Fails to run llama v2 7B quantized on Galaxy S23 Ultra

ai-hub-models
ai-hub-models copied to clipboard