Video-LLaVA icon indicating copy to clipboard operation
Video-LLaVA copied to clipboard

Repository- transformers config missmatch

Open orrzohar opened this issue 1 year ago • 3 comments

I get the following error:

AttributeError: 'LlavaConfig' object has no attribute 'mm_use_im_start_end'. Did you mean: 'mm_use_x_start_end'?

When running your newly updated repository. Note that the config.json file in transformers does not have 'mm_use_im_start_end': https://huggingface.co/LanguageBind/Video-LLaVA-7B/blob/main/config.json.

It is unclear if under your setup, you would like to use the video start/end tokens or use the video tokens.

Also note that in the same function, you are utilizing the now-depreciated DEFAULT_X_START_TOKEN.

https://github.com/PKU-YuanGroup/Video-LLaVA/blob/e93f4927eaa926ed8450b481fde95c994ed23d2d/videollava/eval/video/run_inference_video_qa.py#L49-L53

Best, Orr

orrzohar avatar Feb 18 '24 04:02 orrzohar

Sorry, we fixed that. Could you try it again? We do not use the video start/end tokens.

LinB203 avatar Feb 18 '24 04:02 LinB203

Yeah, I made the same edit on my local repo. When do you use the _act eval inference file? Also, I am getting a module import error (videollava) trying to resolve atm. Have you tried running instruction tuning with the current repository?

orrzohar avatar Feb 18 '24 04:02 orrzohar

When eval activitynet dataset we use _act.py to eval here. pip install -e . to install videollava. Yes, I have tested the training scripts.

LinB203 avatar Feb 18 '24 05:02 LinB203