ai-hub-models icon indicating copy to clipboard operation
ai-hub-models copied to clipboard

[BUG] The generated text is strange from 8 QNN Context Bins which are generated in AI Hub.

Open taeyeonlee opened this issue 6 months ago • 2 comments

Dear Qualcomm,

According to the Sample App ((QNN API C++ : https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/sample_app.html), I made an Android app using Android NDK C++ and pytorch to run the 8 QNN Context Bins on HTP Backend in Android mobile S24/23 Ultra. The text is generated as like below. The generated text is strange. Could you please let me know what the problem is ?

Prompt : "Could you tell me about Llama LLM Model ?" Generated Text : "Of course, no, the possibilities or quElse. providing 3 of a shelter, perhaps? no. Noal 2 verz (fileadas (after) for theCalculematic!Vos itif and that time,icing at the Sh internal one of the 8 image not alwaysCommanager notes this. to. or on time, perhaps the Or. (In US dollars Bill – because at'23 will, which book" (Max num of token is 90)

llama_v2_7b_chat_quantized_PromptProcessor_1_Quantized.bin llama_v2_7b_chat_quantized_PromptProcessor_2_Quantized.bin llama_v2_7b_chat_quantized_PromptProcessor_3_Quantized.bin llama_v2_7b_chat_quantized_PromptProcessor_4_Quantized.bin llama_v2_7b_chat_quantized_TokenGenerator_1_Quantized.bin llama_v2_7b_chat_quantized_TokenGenerator_2_Quantized.bin llama_v2_7b_chat_quantized_TokenGenerator_3_Quantized.bin llama_v2_7b_chat_quantized_TokenGenerator_4_Quantized.bin

In ubuntu PC, taeyeon@Desktop-PC:~/QCT_AI_Hub$ python -m qai_hub_models.models.llama_v2_7b_chat_quantized.demo generated Text : Prompt: "Could you tell me about Llama LLM Model ?" Response: "Of course! Llama LLM (LLM) is a powerful language model developed by Meta AI that is capable of generating coherent and natural-sounding text. It is a variant of the LLaMA model (LLaMA: Open and Efficient Foundation Language Models, Touvron et al. 2023) and is specifically designed for tasks that require a more nuanced and expressive language" (Max num of token is 90)

Has Qualcomm ever run the 8 QNN Context Bins which are generated from Ai Hub, to generate the Text ? What text are generated for the prompt "Could you tell me about Llama LLM Model ? " ?

I compared the inputs (input_ids, attention_mask, position_ids_cos, position_ids_sin, past_values) for the PromptProcessor_1/2/3/4 and TokenGenerator_1/2/3/4 between my C++ code and Python, using the same prompt. The inputs (attention_mask, position_ids_cos, position_ids_sin) for the PromptProcessor_1/2/3/4 and TokenGenerator_1/2/3/4 in my C++ code are same as in python, using the same prompt, until 9 generaed tokens. The input_ids and ​past_values which are generated from Bins are different between my C++ code and Python.

Best regards,

taeyeonlee avatar Aug 09 '24 10:08 taeyeonlee