mlx-audio icon indicating copy to clipboard operation
mlx-audio copied to clipboard

Any hints to add new language support ?

Open uplg opened this issue 9 months ago • 6 comments

Hi there, thanks for the impressive work, it works flawlessly in English!

I'm struggling to find information on how to add new language support, digged through original CSM repos and here but no clue. I'd like to add french support

(I'm also a begineer in these things so any tips is appreciated)

EDIT : Just seen that you will enable full language support for Kokoro (and it includes french). Still the voice is not that good compared to CSM / Orpheus.

Aside : While trying to use Orpheus, I get an error ValueError: Model type llama not supported.

uplg avatar Apr 05 '25 00:04 uplg

Just here to say that I'm facing the same issue:

Error loading model: Model type llama not supported.

When trying:

python -m mlx_audio.tts.generate --model mlx-community/orpheus-3b-0.1-ft-bf16 --text "Hello world" --voice tara --temperature 0.6 --audio_format mp3

https://huggingface.co/mlx-community/orpheus-3b-0.1-ft-bf16

johann-taberlet avatar Apr 10 '25 21:04 johann-taberlet

Could you share the whole trace back and which version of MLX-audio you are running?

Also, could you try installing from source and see if the issue persists?

Blaizzy avatar Apr 10 '25 22:04 Blaizzy

@Blaizzy When running mlx-community/3b-ko-ft-research_release-6bit, the same issue occurs.

Here are the library versions used: mlx-audio : 0.0.3 (The same issue occurs even when using the main branch via pip install git+https://github.com/Blaizzy/mlx-audio.git@main) mlx-lm : 0.22.4 mlx : 0.24.2

Here is the error log:

% python -m mlx_audio.tts.generate --model mlx-community/3b-ko-ft-research_release-6bit --text "Hello, world"
Fetching 7 files: 100%|██████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 24818.37it/s]
ERROR:root:Model type llama not supported.
Error loading model: Model type llama not supported.
Traceback (most recent call last):
  File "/Users/user/miniconda3/envs/llama4/lib/python3.10/site-packages/mlx_audio/tts/utils.py", line 30, in get_model_and_args
    arch = importlib.import_module(f"mlx_audio.tts.models.{model_type}")
  File "/Users/user/miniconda3/envs/llama4/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/user/miniconda3/envs/llama4/lib/python3.10/site-packages/mlx_audio/tts/models/llama/__init__.py", line 1, in <module>
    from .llama import Model, ModelConfig
  File "/Users/user/miniconda3/envs/llama4/lib/python3.10/site-packages/mlx_audio/tts/models/llama/llama.py", line 13, in <module>
    from mlx_lm.utils import stream_generate
ImportError: cannot import name 'stream_generate' from 'mlx_lm.utils' (/Users/user/miniconda3/envs/llama4/lib/python3.10/site-packages/mlx_lm/utils.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/user/miniconda3/envs/llama4/lib/python3.10/site-packages/mlx_audio/tts/generate.py", line 92, in generate_audio
    model = load_model(model_path=model_path)
  File "/Users/user/miniconda3/envs/llama4/lib/python3.10/site-packages/mlx_audio/tts/utils.py", line 141, in load_model
    model_class, model_type = get_model_and_args(model_type=model_type)
  File "/Users/user/miniconda3/envs/llama4/lib/python3.10/site-packages/mlx_audio/tts/utils.py", line 34, in get_model_and_args
    raise ValueError(msg)
ValueError: Model type llama not supported.

swlee60 avatar Apr 11 '25 03:04 swlee60

Using the latest main orpheus works using this command (only in python 3.11, tried 3.12 and got dependencies issues) :

After pull : pip install -r requirements.txt

Then : python -m mlx_audio.tts.generate --model mlx-community/orpheus-3b-0.1-pretrained-bf16 --text "The quick brown fox jumps over the lazy dog." --play

But this doesnt't really helps to add new languages 😄

@swlee60 your command also works using latest main but gives really bad results (definitely not an Hello world) haha. Maybe it doesn't work because you are using python 3.10 instead of 3.11

@johann-taberlet It works using main, I just generated an hello world with the exact same command and it worked.

uplg avatar Apr 11 '25 18:04 uplg

Thanks @m1m1s1ku!

Indeed, py3.11 is recommend

Blaizzy avatar Apr 11 '25 21:04 Blaizzy

@m1m1s1ku @Blaizzy You're right. When running it on Python 3.11, the WAV file is generated correctly. Thank you!

swlee60 avatar Apr 12 '25 03:04 swlee60