mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

MiniCPM-O 2.6 audio support

Open EricLBuehler opened this issue 1 year ago • 5 comments

  • Add the whisper model

EricLBuehler avatar Jan 23 '25 12:01 EricLBuehler

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           41           22           10            9
 JSON                   12          105          104            0            1
 Python                 69         2926         2534           77          315
 Shell                   1           58           22           18           18
 Plain Text              3         3723            0         2413         1310
 TOML                   18          627          556            2           69
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          205          178            1           26
 (Total)                            282          210           32           40
-------------------------------------------------------------------------------
 Markdown               46         3802            0         2891          911
 |- BASH                 6          103          100            0            3
 |- JSON                 1           12           12            0            0
 |- Python               7          121          109            0           12
 |- Rust                15          512          433            0           79
 |- TOML                 2           75           63            0           12
 (Total)                           4625          717         2891         1017
-------------------------------------------------------------------------------
 Rust                  309        99706        89368         1933         8405
 |- Markdown           149         1690           25         1540          125
 (Total)                         101396        89393         3473         8530
===============================================================================
 Total                 467       111044        92653         7346        11045
===============================================================================
  

github-actions[bot] avatar Jan 23 '25 12:01 github-actions[bot]

Hey @EricLBuehler thanks for your work on this feature!

Is there a plan or roadmap for this MLLM feature, and if yes, can we join in to help deliver support of MiniCPM-o?

eugenehp avatar Feb 23 '25 00:02 eugenehp

Hi @eugenehp!

Is there a plan or roadmap for this MLLM feature, and if yes, can we join in to help deliver support of MiniCPM-o?

I'm not currently focusing on this PR (just merged the Phi 4 multimodal model & currently working on audio support). I would absolutely be happy to add you as a collaborator if you are able to help!

EricLBuehler avatar Mar 03 '25 22:03 EricLBuehler

Roger that @EricLBuehler!

I've been playing around with the FFTs to get better MEL support for the audio processing.

I'm far away from doing a proper PR, but I would love your feedback once it's ready.

Re: Phi4 sounds amazing. Going to check it out!

MiniCPM-o has a streaming functionality compared to the Phi4 architecture. Have you had a chance to look into it when you were working on this PR, any insights on implementation will be helpful!

eugenehp avatar Mar 04 '25 20:03 eugenehp

@eugenehp I've sent a collaborator invite.

I'm far away from doing a proper PR, but I would love your feedback once it's ready.

Sounds great.

MiniCPM-o has a streaming functionality compared to the Phi4 architecture. Have you had a chance to look into it when you were working on this PR, any insights on implementation will be helpful!

No, I haven't looked into the streaming functionality.

EricLBuehler avatar Mar 11 '25 09:03 EricLBuehler