an illegal memory access was encountered when running InternVL−Chat−V1.5-Int8 model
Hi all, Thank you for your wonderful work!
I am trying to run the model of InternVL−Chat−V1.5-Int8 using the huggingface link. I was able to get one inference result but the second inference failed due to the issues of:
RuntimeError: CUDA error: an illegal memory access was encountered.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
The server type 4 GPUs: NVIDIA A10G each has 23GB Memory: 192 GB num of CPU: 48 Docker environment is: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 Python version: py39_24.4.0 Pytorch and other packages' version are the same as the installation page
What I have tried: Add environment variable either from system level or using python os package: “TORCH_CUDNN_V8_API_DISABLED=1” “CUDA_LAUNCH_BLOCKING=1”
Other versions of pytorch: pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2
Other systems: nvidia/cuda:12.1.0-devel-ubuntu20.04
Other python version: py310_24.3
All of these experiments did not solve the issues I mentioned before.
Please share your suggestion or ideas.
Thank you very much!
The full error is:
dynamic ViT batch size: 7 Traceback (most recent call last): File "/app/internvl_test_int8_2.py", line 124, in
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_internvl_chat.py", line 304, in chat generation_output = self.generate( File "/opt/conda/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_internvl_chat.py", line 339, in generate vit_embeds = self.extract_feature(pixel_values) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_internvl_chat.py", line 211, in extract_feature vit_embeds = self.vision_model( File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_intern_vit.py", line 411, in forward encoder_outputs = self.encoder( File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_intern_vit.py", line 347, in forward layer_outputs = encoder_layer( File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_intern_vit.py", line 289, in forward hidden_states = hidden_states + self.drop_path1(self.attn(self.norm1(hidden_states)) * self.ls1) File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_intern_vit.py", line 246, in forward x = self._naive_attn(hidden_states) if not self.use_flash_attn else self._flash_attn(hidden_states) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_intern_vit.py", line 229, in _flash_attn qkv = self.qkv(x) File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 797, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 556, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/opt/conda/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 321, in forward CA, CAt, SCA, SCAt, coo_tensorA = F.double_quant(A.to(torch.float16), threshold=state.threshold) RuntimeError: CUDA error: an illegal memory access was encountered Compile with TORCH_USE_CUDA_DSAto enable device-side assertions.
Hello,
I apologize for the delayed response. You can now follow the guide below to use our latest InternVL2 series models: https://internvl.readthedocs.io/en/latest/internvl2.0/quick_start.html
Best regards.
I encountered the same issue. You need to adjust batch_size and batch_size_per_device, ensuring that the gradient accumulation is 4.