DakeQQ comments

Results 18 comments of


                                            DakeQQ

v1.1.0 slower than v1.0.3 on CPU

Feel free to reference this [repository](https://github.com/DakeQQ/Transcribe-and-Translate-Subtitles) — it's designed for users with CPU-only. With an Intel i3-12300 CPU and Whisper-Large-V3-Turbo, it takes just 15 minutes to generate subtitles for a...

v1.1.0 slower than v1.0.3 on CPU

@Tamnac The transcription task utilizes ONNX Runtime with the OpenVINO Execution Provider, offering a technical approach distinct from Faster-Whisper.

Would be nice a CPU only version too.

@0wwafa Feel free to explore this [repository](https://github.com/DakeQQ/Transcribe-and-Translate-Subtitles), designed specifically for CPU-only users. With an Intel i3-12300 CPU (4 threads), it can transcribe a 2-hour movie in approximately 20 minutes using...

Would be nice a CPU only version too.

@0wwafa Thank you for your trial and response. We are working to make this tool more comparable to Faster-Whisper without reinventing the wheel.

[Bad Case]: 安卓部署问题

欢迎使用[这份脚本](https://github.com/DakeQQ/Native-LLM-for-Android/blob/main/Export_ONNX/MiniCPM/MiniCPM_Export.py)导出模型。接着[优化量化](https://github.com/DakeQQ/Native-LLM-for-Android/blob/main/Do_Quantize/Dynamic_Quant/q4_f32.py)后，在[安卓工程中](https://github.com/DakeQQ/Native-LLM-for-Android/tree/main/MiniCPM)进行体验。

CPUAttention.cpp 和 KVCacheManager.cpp 问题

感恩指导～ decode重复使用了这个代码， ```cpp mMeta->add = ids_len; std::vector outputs = module_A->onForward({input_ids, attention_mask, position_ids, logits_index}); mMeta->sync(); ``` decode時候的输入是： ` input_ids = embedding(最大概率ids)`， attention_mask , position_ids 则是根据`seq_len=1`时候的`get_attention_mask()` , `get_postion_ids()`代码 decode需要特别的处理吗？我简单的理解decode跟prefill主要差异在是seq_len=1，position_ids要跟着吐字数累加， attention_mask没有作用=0

CPUAttention.cpp 和 KVCacheManager.cpp 问题

嗯嗯是的，decode阶段这个add也确实保持在+1，但第二个以後输出单词就完全不对。我们尝试过一直使用prefill的方式喂给onForward()，这样也能吐出正确的单词，只是速度慢很多。我想表达的是整个算子计算应该是正确的，所以猜测kv cache没正常工作。请问kv cache在管理过程中，还有哪里需要关注的地方？或是您认为问题不出在这儿呢？

是否会考虑加入ONNX导出？

哈啰~ 欢迎使用这个[链接](https://github.com/DakeQQ/Automatic-Speech-Recognition-ASR-ONNX/blob/main/FireRedASR/Export_FireRedASR_AED.py)自行导出简化的ONNX版本，模型图网络已包含了`STFT`声音特征处理，仅需简单地传入`PCM-int16`数据即可获得文字输出。此外，这份导出代码是基于LLM的导出经验进行了深度优化，在`Intel CPU i3-12300`上可以达到大约`0.17`的实时因子（RTF）。