kmn1024 comments

Results 11 comments of


                                            kmn1024

Request to add sherpa-onnx

The same team now has pre-builts for many Whisper sizes too! https://k2-fsa.github.io/sherpa/onnx/pretrained_models/whisper/export-onnx.html#available-models Wonder how their performance compares.

pymnn inference quality is unstable

Furthermore, it seems the problem only happens if the initialization+inference code is in a separate Process (production environment). In a single thread, single process test case, the problem seems to...

pymnn inference quality is unstable

It seems like adding `dynamic=True` to `mnn_nn.load_module_from_file` fixes the problem! However, it makes inference ~50% slower, even slower than ONNX =( dynamic=True makes sense, since decoder input shape always changes,...

pymnn inference quality is unstable

Thanks for your help, @jxt1234 ! Can you explain more about fully using MNN.numpy? The values of `decoder_sess_kwargs` needs to be numpy, since it is computed by numpy based code....

pymnn inference quality is unstable

@jxt1234 I have uploaded a simple test to reproduce the issue: https://mega.nz/file/pPVTGbBT#nCKr3OvKnXD8IiMHaGFG-4ZMW3455625qKxOYSRpiLA Once you download and expand, there are 3 components: ``` decoder_iso_test.py requirements.txt resources/... ``` `resources/...` has the decoder...

Wrong input size at converting and wrong ineference values on popular models (SD, Whisper)

I too have faced low quality outputs after conversion, but for another application: https://github.com/wangzhaode/mnn-llm/issues/150 It probably comes down to the quantization algorithms available on MNN.

[New feature] mlc-llm support

This issue (and repos) feels pretty dead. What's happening? Are the maintainers working on something that obsoletes Medusa (https://github.com/FasterDecoding/REST)? Is the roadmap still active? @ctlllll @leeyeehoo

[New feature] mlc-llm support

Thanks Yuhong =) Looking forwards!!

[New feature] mlc-llm support

I want to ask for some advice regarding model performance. My goal is to run a custom model on pretty cheap, OpenCL-compatible, hardware. Using MLC, the current speed is ~...

[New feature] mlc-llm support

Thanks for the heads up! If you have a chance, please also include a recipe for adding new types of models too.