MAGAer13 comments

Results 21 comments of


                                            MAGAer13

What is the process of video input? There is no preprocessing code in your code, but Hugging Face spaces supports video input.

We would not update the paper but we will include the specification of the model's design in the video branch. The code and weight's will be released

What is the process of video input? There is no preprocessing code in your code, but Hugging Face spaces supports video input.

> > > really need it ! > > > > > > We will release the video version in this week! > > Hi, did you release the video...

NaN error on videoQA

See #101. Also we have update the checkpoint in HF.

[Suggestion] Could you add the minimum resource requirement in the readme doc?

Nice suggestion, you only need to have a GPU with **24GB memory (i.e. RTX 3090) or EVEN 16GB memory (i.e., V100)** under **fp16 or bf16 precision**. We update in the...

Support VQA with Multiple Choices

You can just add your options into prompt, and use as an open-generation style. We will release mPLUG-Owl-2 recently, which is a better foundation model, and it can better support...

what is the difference among the four task_types?

The xxx_sft is just the indicator of dataset name and task type. You just need to specify xxx as your own dataset name

多模态和单模态数据的混合方式

We random mix the text data and multi-modal data. For each batch, we do not control the ratio, it just random sampled, and the ratio within a batch would similar...

How to do the training on multiple images or image pair data?

I think you miss the token as the placeholder for the image inputs? You may try this: ``` {"image": ["image1.jpg","image2.jpg"], "text": "The following is a conversation between a curious human...

How to do the training on multiple images or image pair data?

I have met the similar case. I think there are some overflow during the training. I recommend you to have a look on the validation which is more reliable.

Dtype error when loading multilingual model in 4bit

We have not tested 4bit.