V3 support for many tasks as possible in Huggingface, starting with "text-to-image"

Open tengomucho opened this issue 1 month ago • 0 comments

With V2, the HuggingFace class allowed to do inference on many different models performing different tasks. Moving to V3 many of the available tasks are not functional anymore.

I would like to see them working again with the V3 ModelBuilder interface, and to be able to use a short code snippet to deploy a model. Here's the complete list of the tasks we used to support:

"text-classification",
"token-classification",
"table-question-answering",
"question-answering",
"zero-shot-classification",
"translation",
"summarization",
"feature-extraction",
"text-generation",
"fill-mask",
"sentence-similarity",
"automatic-speech-recognition",
"text-to-image",
"text-to-speech",
"audio-to-audio",
"audio-classification",
"image-classification",
"object-detection",
"image-segmentation"

The most important tasks for us are "text-generation" (already working on V3) and "text-to-image" (currently broken). If you think you can make at least these two work it would be great.

An example of model that we would like to test is black-forest-labs/FLUX.2-dev.

Dec 09 '25 16:12 tengomucho