InternVL an illegal memory access was encountered when running InternVL−Chat−V1.5-Int8 model

Hi all, Thank you for your wonderful work!

I am trying to run the model of InternVL−Chat−V1.5-Int8 using the huggingface link. I was able to get one inference result but the second inference failed due to the issues of:
RuntimeError: CUDA error: an illegal memory access was encountered. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

The server type 4 GPUs: NVIDIA A10G each has 23GB Memory: 192 GB num of CPU: 48 Docker environment is: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 Python version: py39_24.4.0 Pytorch and other packages' version are the same as the installation page

What I have tried: Add environment variable either from system level or using python os package: “TORCH_CUDNN_V8_API_DISABLED=1” “CUDA_LAUNCH_BLOCKING=1”

Other versions of pytorch: pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2

Other systems: nvidia/cuda:12.1.0-devel-ubuntu20.04

Other python version: py310_24.3

All of these experiments did not solve the issues I mentioned before.

Please share your suggestion or ideas.

Thank you very much!

The full error is:

dynamic ViT batch size: 7 Traceback (most recent call last): File "/app/internvl_test_int8_2.py", line 124, in response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_internvl_chat.py", line 304, in chat generation_output = self.generate( File "/opt/conda/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_internvl_chat.py", line 339, in generate vit_embeds = self.extract_feature(pixel_values) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_internvl_chat.py", line 211, in extract_feature vit_embeds = self.vision_model( File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_intern_vit.py", line 411, in forward encoder_outputs = self.encoder( File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_intern_vit.py", line 347, in forward layer_outputs = encoder_layer( File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_intern_vit.py", line 289, in forward hidden_states = hidden_states + self.drop_path1(self.attn(self.norm1(hidden_states)) * self.ls1) File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_intern_vit.py", line 246, in forward x = self._naive_attn(hidden_states) if not self.use_flash_attn else self._flash_attn(hidden_states) File "/root/.cache/huggingface/modules/transformers_modules/OpenGVLab/InternVL-Chat-V1-5-Int8/acaaed06937c603ab04f084216ecb0268160f538/modeling_intern_vit.py", line 229, in _flash_attn qkv = self.qkv(x) File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 797, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 556, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/opt/conda/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 321, in forward CA, CAt, SCA, SCAt, coo_tensorA = F.double_quant(A.to(torch.float16), threshold=state.threshold) RuntimeError: CUDA error: an illegal memory access was encountered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

May 23 '24 17:05 tairen99

Hello,

I apologize for the delayed response. You can now follow the guide below to use our latest InternVL2 series models: https://internvl.readthedocs.io/en/latest/internvl2.0/quick_start.html

Best regards.

Aug 06 '24 05:08 czczup

I encountered the same issue. You need to adjust batch_size and batch_size_per_device, ensuring that the gradient accumulation is 4.

Jan 30 '25 19:01 kfzyqin