transformerlab-app icon indicating copy to clipboard operation
transformerlab-app copied to clipboard

Can't use audio models downloaded by entering huggingface ID (i.e. not in gallery)

Open dadmobile opened this issue 1 month ago • 3 comments

The app decides what modality a model is by its pipeline-tag property. This is set correctly on models downloaded from our gallery, but for direct Hugging Face models it will return null.

We managed to figure out some patterns to use to make diffusion work correctly. Maybe there is a "good enough" way to determine if a model is TTS or STT.

dadmobile avatar Nov 13 '25 18:11 dadmobile

This is only for MLX right? I think it works normally for other models which are non-MLX

deep1401 avatar Nov 14 '25 20:11 deep1401

I can check but I think it's for any models because we rely on pipeline-tag? But maybe that reads from hugging face in a way we handle (but I suspect not)?

dadmobile avatar Nov 14 '25 20:11 dadmobile

I think the primary problem is that MLX audio models have no pipeline tag defined on huggingface which causes this issue. But the other models have the pipeline tags correctly specified

This can be checked by downloading the unsloth orpheus model and training it

deep1401 avatar Nov 14 '25 20:11 deep1401