Yanwei

Results 27 comments of Yanwei

Hi, it seems the CUDA version and compiled PyTorch version are not the same one. Please try to align them to 11.7 or 12.2.

Hi, thanks for this suggestion! Because we do not have enough resources to validate the results of Zero3-offload at this time, we will try to support it later.

Hi, it seems this issue could casued by [this line](https://github.com/dvlab-research/LLaMA-VID/blob/d1074f3662a772d1b3c723416af59314ba593f67/llamavid/model/builder.py#L91), please check this and make sure the `CLIPVisionTower` in `llamavid/model/multimodal_encoder/clip_encoder.py` is normally loaded.

Hi, it seems the data is not well-loaded. Please make sure all the downloaded datasets are well organized as in README.

Hi, for the LLaMA 7B and 13B, we follow the instruction format in [LLaVA](https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/pretrain.sh). In the pretraining stage, the main focus is image caption. So, it works well with `plain`...

Hi, it is not officially supported. But you can try it. The whole process should be easy.