ai-hub-models
ai-hub-models copied to clipboard
[BUG] genie-t2t-run Fails to run llama v2 7B quantized on Galaxy S23 Ultra
When running llama v2 7B quantized on QNN HTP backend of Snapdragon-Gen2, the error is following.
What does it mean ? "Could not create context from binary for context index = 0 : err 1009"
dm3q:/ $ export LD_LIBRARY_PATH=/data/local/tmp
dm3q:/data/local/tmp $ ./genie-t2t-run -c htp-model-config-llama2-7b.json -p "<<SYS>>\nYou are a helpful AI assistant.<</SYS>>\n\n[INST] have we been to Mars? [/INST]"
Using libGenie.so version 1.0.0
[WARN] "Unable to initialize logging in backend extensions."
[INFO] "Using create From Binary"
[INFO] "Allocated total size = 300255744 across 8 buffers"
[ERROR] "Could not create context from binary for context index = 0 : err 1009"
[ERROR] "Create From Binary FAILED!"
Failure to initialize model
ERROR at line 234: Failed to create the dialog.
1|dm3q:/data/local/tmp $
The Bin files (llama2_0.serialized.bin, llama2_1.serialized.bin, llama2_2.serialized.bin, llama2_3.serialized.bin) were generated with --target-gen snapdragon-gen2, as like below
python gen_ondevice_llama.py --hub-model-id m1q8lpygn,mrmdjx4km,mkngj646n,mknjj0gxn,meq2dy80m,mzmx5gykn,m6qejgw7m,mwn0p5d8m --output-dir ./export --tokenizer-zip-path ./tokenizer.zip --target-gen snapdragon-gen2 --target-os android