DakeQQ
DakeQQ
您好,推荐一个基于ONNX Runtime的安卓布署项目,Depth Anything-Small (resized) 在2*A76核心上,能跑出11FPS的成绩,8Gen2能达到22FPS. Hello, We recommend an Android deployment project based on ONNX Runtime. The Depth Anything-Small model (resized) achieves 11FPS on dual A76 cores and hits 22FPS...
### Feature request / 功能建议 https://github.com/DakeQQ/Native-LLM-for-Android 您好,推荐一个基于ONNXRuntime的安卓LLM布署项目,使用华为P40能跑出**5.2 token/s**, 8Gen2能跑**8.5 token/s**的成绩(_q8f32 & 786滑动窗口上下文)_. 并且可以期待未来ONNXRuntime更新**q4f16**后,速度可能再提升50%. Hello, recommend an Android LLM deployment project based on ONNXRuntime that achieves **5.2 tokens/s** on Huawei P40...
### Describe the issue I am trying to run the QNN HTP backend on my Android device, but I repeatedly encounter the following error: `[E:onnxruntime:, qnn_execution_provider.cc: 513 GetCapability] QNN SetupBackend...
Hi~ I've accomplished the successful deployment of YOLOv9 on the Qualcomm 8Gen2 NPU, hitting an astounding 47FPS with v9-C and surpassing 100FPS with v9-T. Eager to share these outstanding achievements...
您好, 当执行tokenizer.cpp第627行, HuggingfaceTokenizer->encode_.at(s), 会因为encode里面存放的每一条string最后位都带有“\r”而回报no key found. 同样在decode_.at(id)也会找不到key. 我目前是在第498行下方新增“line.pop_back();"暂时解决这问题. 还麻烦您看看咋回事. 謝謝您~
您好,MNN团队, 我们正在尝试复现MNN-LLM项目,特别是关注`llm.cpp`中的源代码。我们已使用`llmexport.py`导出了MNN模型,并模仿了`llm.cpp`中的加载和推理过程。 ```cpp // 加载部分 mMeta = std::make_shared(); runtimeManager->setHint(MNN::Interpreter::HintMode::QKV_QUANT_OPTIONS, 0); // Turn it off to ensure no accuracy loss during the test. runtimeManager->setHintPtr(MNN::Interpreter::HintMode::KVCACHE_INFO, mMeta.get()); // 推理部分 mMeta->add = ids_len; std::vector...