nullname comments

Results 104 comments of


                                            nullname

PR: Refine ggml-qnn backend(QNN, Qualcomm Neural Network,aka Qualcomm AI Engine Direct) for latest ggml,whisper.cpp,llama.cpp

> How do you handle the QNN graph build-execute-free during inference? As we are also integrating the QNN in our framework, the graph building is time consuming and the memory...

PR: Refine ggml-qnn backend(QNN, Qualcomm Neural Network,aka Qualcomm AI Engine Direct) for latest ggml,whisper.cpp,llama.cpp

> > > How do you handle the QNN graph build-execute-free during inference? As we are also integrating the QNN in our framework, the graph building is time consuming and...

Support for Snapdragon X Elite NPU & GPU

> I tried to find out, if GPU/NPU/... use can help with power-consumption. Both for prompt-processing/fill (compute-bound) and token-generation/decode (mainly memory-bandwidth bound) - See [my medium.com article](https://medium.com/@andreask_75652/power-consumption-of-our-ai-use-f2b1f9bce97b), where I analyzed...

sunxi-6.10: Add armbian patches

Hi @The-going , regarding the sunxi's pwm driver and gmac driver, created a PR at you fork, please have a look: https://github.com/The-going/armbian-build/pull/25

`Automatic` board configs status synchronise

@igorpecovnik , info updated

Feature Request: Compile bug: QCM6490 Platform Support

Hi @hiwudery Sorry, Hexagon v68 is a bit old now—I’m focusing on v73 and newer architectures. If the public API differences between v68 and the newer toolchains are small, you...

[qnn][bug] FP16 matmul 分配到 qnn_npu 上运行时推理崩溃

你好，性能问题可以follow下这个thread哈，qnn的convert确实性能比较差 https://github.com/chraac/llama.cpp/issues/34#issuecomment-2708050770

[qnn][bug] FP16 matmul 分配到 qnn_npu 上运行时推理崩溃

> 观察到 bind_tensors 函数是将数据从 ggml_tensor.data 拷贝到 qnn_rpc_buffer，但由于 should_use_mem_handle 始终为false，实际并未完成这一步拷贝。那么qnn-npu每次使用数据都要在sdk内部进行自动拷贝吗？还是说它和CPU共用内存？ Nice catch! 这里禁用这个rpc buffer的原因是，只在每个tensor里面使用rpc buffer，会无可避免的多一次memcpy，而如果直接把ggml tensor的data直接给qnn，有可能他会有更优的解决方案。之前还设想过，如果把rpc_buffer给backend buffer管理，但是这个方案会导致一个buffer里面有多个tensor，这种方式好像在qnn里面没办法实现，不过这种方式在 `hexagon-npu` 里面实现了，所以理论上那里更高效

[qnn][bug] FP16 matmul 分配到 qnn_npu 上运行时推理崩溃

可以看下qnn内部打印的event的log，这里基本上排除了其他的因素，单纯就是他qnn graph的性能，8gen2下： ```log [profiler][MUL_MATf32_2048x512q4_K_2048x2f32f16_1024f16]print_profile_events start ---------------- [profiler][MUL_MATf32_2048x512q4_K_2048x2f32f16_1024f16]event[0]: Number of HVX threads used, count: 4 [profiler][MUL_MATf32_2048x512q4_K_2048x2f32f16_1024f16]event[1]: RPC (execute) time, duration: 29.409 ms [profiler][MUL_MATf32_2048x512q4_K_2048x2f32f16_1024f16]event[2]: QNN accelerator (execute) time, duration: 25.280 ms [profiler][MUL_MATf32_2048x512q4_K_2048x2f32f16_1024f16]event[3]:...

[qnn][bug] FP16 matmul 分配到 qnn_npu 上运行时推理崩溃

> 我又看了下qnn的.alloc_buffer，发现里面实际并没有分配npu内存，这可能是我上述尝试失败的原因。我查询到npu使用的内存是VCTM而非和CPU共用DDR，所以npu的内存管理都是在SDK内部进行的吗？可以看下他programming reference的memory部分 https://docs.qualcomm.com/bundle/publicresource/topics/80-N2040-61/memory.html