MiniCPM-O 2.6 audio support
- Add the whisper model
Code Metrics Report
=============================================================================== Language Files Lines Code Comments Blanks =============================================================================== C Header 2 35 28 0 7 Dockerfile 1 41 22 10 9 JSON 12 105 104 0 1 Python 69 2926 2534 77 315 Shell 1 58 22 18 18 Plain Text 3 3723 0 2413 1310 TOML 18 627 556 2 69 YAML 2 21 19 2 0 ------------------------------------------------------------------------------- Jupyter Notebooks 4 0 0 0 0 |- Markdown 2 77 32 31 14 |- Python 2 205 178 1 26 (Total) 282 210 32 40 ------------------------------------------------------------------------------- Markdown 46 3802 0 2891 911 |- BASH 6 103 100 0 3 |- JSON 1 12 12 0 0 |- Python 7 121 109 0 12 |- Rust 15 512 433 0 79 |- TOML 2 75 63 0 12 (Total) 4625 717 2891 1017 ------------------------------------------------------------------------------- Rust 309 99706 89368 1933 8405 |- Markdown 149 1690 25 1540 125 (Total) 101396 89393 3473 8530 =============================================================================== Total 467 111044 92653 7346 11045 ===============================================================================
Hey @EricLBuehler thanks for your work on this feature!
Is there a plan or roadmap for this MLLM feature, and if yes, can we join in to help deliver support of MiniCPM-o?
Hi @eugenehp!
Is there a plan or roadmap for this MLLM feature, and if yes, can we join in to help deliver support of MiniCPM-o?
I'm not currently focusing on this PR (just merged the Phi 4 multimodal model & currently working on audio support). I would absolutely be happy to add you as a collaborator if you are able to help!
Roger that @EricLBuehler!
I've been playing around with the FFTs to get better MEL support for the audio processing.
I'm far away from doing a proper PR, but I would love your feedback once it's ready.
Re: Phi4 sounds amazing. Going to check it out!
MiniCPM-o has a streaming functionality compared to the Phi4 architecture. Have you had a chance to look into it when you were working on this PR, any insights on implementation will be helpful!
@eugenehp I've sent a collaborator invite.
I'm far away from doing a proper PR, but I would love your feedback once it's ready.
Sounds great.
MiniCPM-o has a streaming functionality compared to the Phi4 architecture. Have you had a chance to look into it when you were working on this PR, any insights on implementation will be helpful!
No, I haven't looked into the streaming functionality.