Amit Agarwal
Amit Agarwal
Idefics 3 follows same pattern as Idefics2. Building HF from the Source Code and https://github.com/huggingface/transformers/pull/32473 enables Idefics3 The model has been tested with the transformers library
Add Multi-turn support for Intern-VL2. Tested on MMDU dataset via full benchmark
Across different LMMs the max new token is different . I believe we should have a consistent MAX_NEW_TOKENS across the project, set to 512 or 1024 If it makes sense,...
Hey @kennymckormick @junming-yang Sharing offline eval results to help updated the HF leaderboard [InternVL2-76B.zip](https://github.com/user-attachments/files/16482630/InternVL2-76B.zip) [vila_3b.zip](https://github.com/user-attachments/files/16482631/vila_3b.zip) [vila_8b.zip](https://github.com/user-attachments/files/16482632/vila_8b.zip) [vila_13b.zip](https://github.com/user-attachments/files/16482633/vila_13b.zip) [vila_40b.zip](https://github.com/user-attachments/files/16482634/vila_40b.zip)
``` Generating with V2.0 A: torch.Size([7175, 3200]), B: torch.Size([9600, 3200]), C: (7175, 9600); (lda, ldb, ldc): (c_int(229600), c_int(307200), c_int(229600)); (m, n, k): (c_int(7175), c_int(9600), c_int(3200)) cuBLAS API failed with status...
Hey @kennymckormick @junming-yang I want to add MuirBench Dataset to the kit. It has mutliple images for a single query. Does VLM support any such dataset which I can refer...
While trying the benchmark, for many datasets the logs have ` Dataset MMMU_VAL is not officially supported` Dataset may vary. Does that mean the results and benchmarks can be different...
What is the conv model for 3B VILA 1.5 ?
Hey @kennymckormick @jinyu121 @FangXinyu-0913 : I was planning to add : https://huggingface.co/mistralai/Pixtral-12B-2409 model to the eval framework. The Mixtral team recommends using vllm for inference. Is it okay to introduce...
### Motivation Hi Team, Thanks for the great effort and open-sourcing the MLLM and report. It was a great read to understand. One key question I wanted to ask was...