Can't use audio models downloaded by entering huggingface ID (i.e. not in gallery)
The app decides what modality a model is by its pipeline-tag property. This is set correctly on models downloaded from our gallery, but for direct Hugging Face models it will return null.
We managed to figure out some patterns to use to make diffusion work correctly. Maybe there is a "good enough" way to determine if a model is TTS or STT.
This is only for MLX right? I think it works normally for other models which are non-MLX
I can check but I think it's for any models because we rely on pipeline-tag? But maybe that reads from hugging face in a way we handle (but I suspect not)?
I think the primary problem is that MLX audio models have no pipeline tag defined on huggingface which causes this issue. But the other models have the pipeline tags correctly specified
This can be checked by downloading the unsloth orpheus model and training it