Bo He
Bo He
I think there are some non-deterministic steps for the PyTorch code. It is not the setting of the seed.
Hello, I have updated the demo.ipynb. So the model loads the default config from lavis/configs/models/blip2/blip2_instruct_vicuna7b.yaml. If you want to load a finetuned checkpoints, you need to first set the load_finetuned=True...
Hello! Currently, the downloading script only supports the MSRVTT and MSVD datasets. To obtain other datasets, please refer to the provided links and download the videos using the official download...
Thanks for pointing out this bug. I fixed this error and updated it in the latest commit. For the query memory bank, you can check the detailed code here https://github.com/boheumd/MA-LMM/blob/main/lavis/models/blip2_models/blip2.py#L166
Hello, I have updated the demo.ipynb in the latest version. You can easily specify the memory_bank_length and num_frames when loading the model. Please note that, every time you change the...
Hello, please check for the latest code update. Currently the max_num_frames is set to 120 by default. If you need to test model on long videos, you need to set...
Hi, did you follow the instruction from https://github.com/salesforce/LAVIS/tree/main/projects/instructblip to download the vicuna-7b v1.1 and apply the delta weights to the original LLaMA weights? Or according to this [issue](https://github.com/salesforce/LAVIS/issues/365#issuecomment-1593017454), you can...
I have not came across the same problem before. But you can follow the instruction [here](https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md) to prepare the right vicuna-v1.1 weights. You can clone the repository FastChat outside MA-LMM,...
Hello. You can refer to the following code to calculate rho and tau results. https://github.com/e-apostolidis/CA-SUM/blob/main/evaluation/choose_best_model.py#L67 https://gitlab.uni-hannover.de/hussainkanafani/unsupervised-video-summarization/-/blob/master/src/evaluation/BaseEvaluator.py#L7 https://github.com/TIBHannover/MSVA/blob/master/train.py#L430
Hi, you can refer to the implementation details section in the main paper. For the SumMe and TVSum dataset, we adopt the pre-trained image caption model GPT-2 to generate the...