kmn1024

Results 11 comments of kmn1024

The same team now has pre-builts for many Whisper sizes too! https://k2-fsa.github.io/sherpa/onnx/pretrained_models/whisper/export-onnx.html#available-models Wonder how their performance compares.

Furthermore, it seems the problem only happens if the initialization+inference code is in a separate Process (production environment). In a single thread, single process test case, the problem seems to...

It seems like adding `dynamic=True` to `mnn_nn.load_module_from_file` fixes the problem! However, it makes inference ~50% slower, even slower than ONNX =( dynamic=True makes sense, since decoder input shape always changes,...

Thanks for your help, @jxt1234 ! Can you explain more about fully using MNN.numpy? The values of `decoder_sess_kwargs` needs to be numpy, since it is computed by numpy based code....

@jxt1234 I have uploaded a simple test to reproduce the issue: https://mega.nz/file/pPVTGbBT#nCKr3OvKnXD8IiMHaGFG-4ZMW3455625qKxOYSRpiLA Once you download and expand, there are 3 components: ``` decoder_iso_test.py requirements.txt resources/... ``` `resources/...` has the decoder...

I too have faced low quality outputs after conversion, but for another application: https://github.com/wangzhaode/mnn-llm/issues/150 It probably comes down to the quantization algorithms available on MNN.

This issue (and repos) feels pretty dead. What's happening? Are the maintainers working on something that obsoletes Medusa (https://github.com/FasterDecoding/REST)? Is the roadmap still active? @ctlllll @leeyeehoo

Thanks Yuhong =) Looking forwards!!

I want to ask for some advice regarding model performance. My goal is to run a custom model on pretty cheap, OpenCL-compatible, hardware. Using MLC, the current speed is ~...

Thanks for the heads up! If you have a chance, please also include a recipe for adding new types of models too.