Video-LLaVA icon indicating copy to clipboard operation
Video-LLaVA copied to clipboard

ImportError: cannot import name '_expand_mask' from 'transformers.models.clip.modeling_clip'

Open qiuchen001 opened this issue 7 months ago • 4 comments

scenes: CLI Inference

command: CUDA_VISIBLE_DEVICES=0 python3 -m videollava.serve.cli --model-path "/root/Video-LLaVA-7B" --file "/root/videos/8132-207209040_small.mp4" --load-4bit

issues: [2024-07-21 04:02:21,967] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Traceback (most recent call last): File "/root/.conda/envs/video-llava/lib/python3.10/runpy.py", line 187, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/root/.conda/envs/video-llava/lib/python3.10/runpy.py", line 110, in _get_module_details import(pkg_name) File "/root/Video-LLaVA/videollava/init.py", line 1, in from .model import LlavaLlamaForCausalLM File "/root/Video-LLaVA/videollava/model/init.py", line 1, in from .language_model.llava_llama import LlavaLlamaForCausalLM, LlavaConfig File "/root/Video-LLaVA/videollava/model/language_model/llava_llama.py", line 26, in from ..llava_arch import LlavaMetaModel, LlavaMetaForCausalLM File "/root/Video-LLaVA/videollava/model/llava_arch.py", line 21, in from .multimodal_encoder.builder import build_image_tower, build_video_tower File "/root/Video-LLaVA/videollava/model/multimodal_encoder/builder.py", line 3, in from .languagebind import LanguageBindImageTower, LanguageBindVideoTower File "/root/Video-LLaVA/videollava/model/multimodal_encoder/languagebind/init.py", line 6, in from .image.modeling_image import LanguageBindImage File "/root/Video-LLaVA/videollava/model/multimodal_encoder/languagebind/image/modeling_image.py", line 11, in from transformers.models.clip.modeling_clip import CLIPMLP, CLIPAttention, CLIPTextEmbeddings, CLIPVisionEmbeddings,
ImportError: cannot import name '_expand_mask' from 'transformers.models.clip.modeling_clip' (/root/.conda/envs/video-llava/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py)

I've already install required packages:

git clone https://github.com/PKU-YuanGroup/Video-LLaVA
cd Video-LLaVA
conda create -n videollava python=3.10 -y
conda activate videollava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
pip install decord opencv-python git+https://github.com/facebookresearch/pytorchvideo.git@28fe037d212663c6a24f373b94cc5d478c8c1a1d

AND pip install -U transformers

qiuchen001 avatar Jul 20 '24 20:07 qiuchen001