Feng Li

Results 119 comments of Feng Li

Hi, are you using the llava-next-interleave model or the original single-image model?

Hi, `(img_num, 3, 384, 384)` works for our model for multi-image setting. `(img_num, k, 3, 384, 384)` also works for our model to process anyres single-image.

Hi, LLaVA-Next-Interleave version is out, which naturally supports multi-image interleaved inputs. Please refer to [this evaluation code](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/inference/llava/eval/model_vqa.py) for the input format. It can directly handle the input format you provide....

I believe this version is pytorch version.

Sorry for the late reply. How much memory do you need in our case? We use about 30G for Resnet50 batch size 4.

Hi, thanks for this question. You are using the first version of the eval json. We have updated the evaluation json. Please download a new one from our website. LMK...

How about the 7b results? Do they match the results in the table?

Hi, thanks for your feedback. We find out that the previous 7B model is not our best model and we opensource a wrong one. You can download our new 7B...

We did not try oonx before, so we could not offer any suggestions now.

Sorry, we did implement that. You are welcome to open a PR if you have any ideas.