mlx-audio icon indicating copy to clipboard operation
mlx-audio copied to clipboard

Should Sampling Rate be Automatically Set with Default Sampling Rate Each Model Uses?

Open chigkim opened this issue 9 months ago • 2 comments

Is there a reason why sampling rate is settable by user? Don't all models have default sampling rate where they operate? If so, I wonder if the sampling rate should be returned by model when you load a model instead of making it set by user.

For example, spark tts uses 16k SR, and if you use mlx_audio.tts.generate with --play flag, it plays the generated audio in the default 24k, creating chipmunk effect. If I change the generate.py and delay initializing the player with the right sampling rate later, it works.

        for i, result in enumerate(results):
            if play:
                if i==0:
                    # Load AudioPlayer
                    player = AudioPlayer(sample_rate=result.sample_rate)

Thanks!

chigkim avatar May 14 '25 15:05 chigkim

Yes, this would be nice! It's not always present in the model config, so we may need a manual mapping for some models. Do you want to send a PR? 🙏

lucasnewman avatar May 14 '25 16:05 lucasnewman

Thanks for opening this issue!

Yes, I agree.

It's one of the things I kept in mind coming into v0.2.0 since we now have lots of models with varying sampling rates.

Unfortunetly, it went to the backlog along side a redesign of our python APIs.

Blaizzy avatar May 14 '25 23:05 Blaizzy