yangyyj comments

Results 10 comments of


                                            yangyyj

How to determine shadowLayers

Have you solved this problem? I meet the same question

“Failed to register op package” Error when running./demo_qwen_npu createContext()

请问您解决这个问题了吗

请问我想使用demo_llama3推理llama-3.1-8b模型需要增加哪些内容

上述内容是我在x86架构服务器上运行的结果（bin目录下的demo_llama3）我又测试了在手机端arm架构运行情况(bin-arm目录下的demo_llama3)，结果如下：情况似乎更加糟糕

请问我想使用demo_llama3推理llama-3.1-8b模型需要增加哪些内容

以下是我增加的的configuration部分： ``` else if (billions == "8B" || billions == "8b"){ vocab_size = 128256; hidden_dim = 4096; head_size = 32; num_key_value_heads = 8; ffn_hidden = 14336; block_num = 32; max_position_embeddings...

请问我想使用demo_llama3推理llama-3.1-8b模型需要增加哪些内容

> 您用的是 https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct 这个模型吗？您是怎么得到mllm模型的？是的，我使用该模型，模型文件如下： ![Image](https://github.com/user-attachments/assets/62db643e-2e80-4d0f-865e-22b9e73df52b) 然后我通过下面代码得到fp32的mllm格式文件，如下 ![Image](https://github.com/user-attachments/assets/331b2055-c3f9-48c4-9de6-33f7742847dc) 然后使用如下代码量化为Q4K格式 ![Image](https://github.com/user-attachments/assets/2deddd7f-f75b-45cd-baec-558a124b84f3) 我使用fp32与量化后的文件进行推理均会出现上述问题

请问我想使用demo_llama3推理llama-3.1-8b模型需要增加哪些内容

想请问一下你们有发现是什么原因造成的吗，如果修改起来太过麻烦，可以告诉我原因就好，我自己来进行改正，感谢你们

请问我想使用demo_llama3推理llama-3.1-8b模型需要增加哪些内容

十分感谢您的回答，方便请问一下您转换出的模型fp32格式的大小吗下面是我转换出的模型的大小

请问我想使用demo_llama3推理llama-3.1-8b模型需要增加哪些内容

感谢您的耐心回答，我重新拉取最新代码，整个项目重新编译后再次尝试可以成功推理了（虽然还是没搞清楚为什么前面推理会出现上述问题）

Is it available to accelerate llama2 with NPU?

I completed the NPU inference part of the Llama2 - 7B model according to the publicly available code. However, when I tested it on a 24GB device, it still crashed...

Android 8gen3 NPU上推理卡死

问题一：传入两个模型其中int8模型进行prefill，其中q4k（int4模型）模型是来进行decode阶段的问题二：卡死你可以看一下手机的内存占用情况，可能是内存占用满了，我跑7b模型的时候会因为内存不足而导致手机直接重启（加载两个模型以及计算图的原因会导致内存占用很大）问题三：Qwen2.5-1.5B-Instruct.mllm部分应该传入int8量化模型，而不是q40模型（该模型是int4量化），int8模型你可以按照https://github.com/UbiquitousLearning/mllm/blob/main/tools/convertor/profiling_activation/README.md 如下链接的方法转换得到