DakeQQ
DakeQQ
Feel free to reference this [repository](https://github.com/DakeQQ/Transcribe-and-Translate-Subtitles) — it's designed for users with CPU-only. With an Intel i3-12300 CPU and Whisper-Large-V3-Turbo, it takes just 15 minutes to generate subtitles for a...
@Tamnac The transcription task utilizes ONNX Runtime with the OpenVINO Execution Provider, offering a technical approach distinct from Faster-Whisper.
@0wwafa Feel free to explore this [repository](https://github.com/DakeQQ/Transcribe-and-Translate-Subtitles), designed specifically for CPU-only users. With an Intel i3-12300 CPU (4 threads), it can transcribe a 2-hour movie in approximately 20 minutes using...
@0wwafa Thank you for your trial and response. We are working to make this tool more comparable to Faster-Whisper without reinventing the wheel.
欢迎使用[这份脚本](https://github.com/DakeQQ/Native-LLM-for-Android/blob/main/Export_ONNX/MiniCPM/MiniCPM_Export.py)导出模型。接着[优化量化](https://github.com/DakeQQ/Native-LLM-for-Android/blob/main/Do_Quantize/Dynamic_Quant/q4_f32.py)后,在[安卓工程中](https://github.com/DakeQQ/Native-LLM-for-Android/tree/main/MiniCPM)进行体验。
感恩指导~ decode重复使用了这个代码, ```cpp mMeta->add = ids_len; std::vector outputs = module_A->onForward({input_ids, attention_mask, position_ids, logits_index}); mMeta->sync(); ``` decode時候的输入是: ` input_ids = embedding(最大概率ids)`, attention_mask , position_ids 则是根据`seq_len=1`时候的`get_attention_mask()` , `get_postion_ids()`代码 decode需要特别的处理吗?我简单的理解decode跟prefill主要差异在是seq_len=1,position_ids要跟着吐字数累加, attention_mask没有作用=0
嗯嗯是的,decode阶段这个add也确实保持在+1,但第二个以後输出单词就完全不对。 我们尝试过一直使用prefill的方式喂给onForward(),这样也能吐出正确的单词,只是速度慢很多。我想表达的是整个算子计算应该是正确的,所以猜测kv cache没正常工作。请问kv cache在管理过程中,还有哪里需要关注的地方?或是您认为问题不出在这儿呢?
哈啰~ 欢迎使用这个[链接](https://github.com/DakeQQ/Automatic-Speech-Recognition-ASR-ONNX/blob/main/FireRedASR/Export_FireRedASR_AED.py)自行导出简化的ONNX版本,模型图网络已包含了`STFT`声音特征处理,仅需简单地传入`PCM-int16`数据即可获得文字输出。此外,这份导出代码是基于LLM的导出经验进行了深度优化,在`Intel CPU i3-12300`上可以达到大约`0.17`的实时因子(RTF)。