Yoni Gozlan
Yoni Gozlan
Also added Udop to the list and to #32544 .
Hi @davidgxue and @MnCSSJ4x, I was wondering if you've made any recent progress on uniformizing LLaVa and Paligemma. They're some of the last image-text-to-text models left to uniformize, and it...
No problem @davidgxue! I will get started on LLaVa then
Now that https://github.com/huggingface/transformers/pull/33385 has been merged, this should be ready for review!
Good timing ;) https://github.com/huggingface/transformers/pull/34170
> The example is super nice, but not all image-text-to-text models support multiple images reliably. I'd go for a simpler single-image example for now. Agreed with this! another problem is...
Added one test for llava processor :). I could add one for every vlms processor that use chat template, but as they all use the same underlying `apply_chat_template`, I thought...
Thanks for the feedback @knkski! Although it's not really an objective of this pipeline, I think we can try to add support and raise a warning at least, wdyt @Rocketknight1...
@Rocketknight1 @knkski , text-only inference should be supported now :)
There is still some issues with pipeline tests: - It seems that pipeline model tests are based on "tiny models" available on `hf-internal-testing`, but those tiny models don't seem to...