[Bug] Aborted (core dumped) after completing single inference

Open TJ-Ouyang opened this issue 1 year ago • 0 comments

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.
[x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

After each multi-images inference done, the core will be dumped and stopped. Streaming output is not available.

Reproduction

` from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig, VisionConfig from lmdeploy.vl import load_image

import os import json import ast import pandas as pd from tqdm import tqdm

import setproctitle setproctitle.setproctitle("TJ-InternVL_Multi_Images")

os.environ['CUDA_VISIBLE_DEVICES'] = '0'

model = '/data2/zhangxin/model_zoo/OpenGVLab/InternVL2-40B' pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))

folder_path = "/data1/ouyangtianjian/ALL_DATA_MIXED_NEW" for task in os.listdir(folder_path): if "Socioeconomic" not in task and "Retrieval" not in task:
continue task_name = task.split("_formatted")[0]

done_len = 0
if os.path.exists(f"/data1/ouyangtianjian/ALL_DATA_MIXED_NEW/results_new_multi_images/InternVL2-40B/{task_name}_test_result.jsonl"):
    with open(f"/data1/ouyangtianjian/ALL_DATA_MIXED_NEW/results_new_multi_images/InternVL2-40B/{task_name}_test_result.jsonl", "r") as f:
        done_lines = f.readlines()
    done_len = len(done_lines)

with open(f"/data1/ouyangtianjian/ALL_DATA_MIXED_NEW/{task_name}_formatted/{task_name}_data_test_1000.json", "r") as f:
    test_data_1000 = json.load(f)
    
for cell in tqdm(test_data_1000[done_len:]):
    try:
        question = cell["conversations"][0]["value"].replace("<image>", "{IMAGE_TOKEN}")
        image_paths = cell["image"]
        images = [load_image(image_path) for image_path in image_paths]

        response = pipe((question, images), use_tqdm=False, gen_config=GenerationConfig(temperature=0.3))

        with open(f"/data1/ouyangtianjian/ALL_DATA_MIXED_NEW/results_new_multi_images/InternVL2-40B/{task_name}_test_result.jsonl", "a") as fout:
            fout.write(json.dumps({
                "id": cell["sample_id"],
                "true_label": cell["conversations"][1]["value"],
                "response":  response.text
            }) + "\n")
    except Exception as e:
        print("ERROR:", e)

Environment

absl-py==2.1.0                                                                                                                                                                                
accelerate==0.33.0                                                                                                                                                                            
addict==2.4.0                                                                                                                                                                                 
aiofiles==24.1.0                                                                                                                                                                              
aiohappyeyeballs==2.4.0                                                                                                                                                                       
aiohttp==3.10.5                                                                                                                                                                               
aiosignal==1.3.1                                                                                                                                                                              
altair==5.4.1                                                                                                                                                                                 
annotated-types==0.7.0                                                                                                                                                                        
anyio==4.4.0                                                                                                                                                                                  
argon2-cffi==23.1.0                                                                                                                                                                           
argon2-cffi-bindings==21.2.0                                                                                                                                                                  
arrow==1.3.0                                                                                                                                                                                  
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work                                                                                                    
async-lru==2.0.4                                                                                                                                                                              
async-timeout==4.0.3                                                                                                                                                                          
attrs==24.2.0                   
autocommand==2.2.2                                                                                                                                                                            
babel==2.16.0                                  
backports.tarfile==1.2.0                                                                                                                                                                      
beautifulsoup4==4.12.3                                                                                                                                                                        
bitsandbytes==0.41.0                                                                                                                                                                          
bleach==6.1.0                                                                                                                                                                                 
blinker==1.8.2                                                                                                                                                                                
cachetools==5.5.0                                                                              
certifi==2024.8.30                                                                                                                                                                            
cffi==1.17.0                                                                                   
charset-normalizer==3.3.2                                                                                                                                                                     
click==8.1.7                  
cmake==3.25.0                                                                                                                                                                                 
colorama==0.4.6                                                                                
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work                                                                                                              
contourpy==1.3.0      
cycler==0.12.1                                                                                                                                                                                
debugpy @ file:///croot/debugpy_1690905042057/work                                             
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
decord==0.6.0                       
deepspeed==0.13.5
defusedxml==0.7.1     
einops==0.6.1         
einops-exts==0.0.4
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1720869315914/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1725214404607/work                                                                                                    
fastapi==0.112.2      
fastjsonschema==2.20.0         
ffmpy==0.4.0     
filelock==3.15.4          
fire==0.6.0  
flash_attn==2.3.6                                                                                                                                                                             
fonttools==4.53.1                                                                                                                                                                             
fqdn==1.5.1                                                                                                                                                                                   
frozenlist==1.4.1                                                                                                                                                                             
fsspec==2024.6.1                                                                                                                                                                              
future==1.0.0                                                                                                                                                                                 
gdown==5.2.0                                                                                                                                                                                  
gitdb==4.0.11                                                                                                                                                                                 
GitPython==3.1.43                                                                                                                                                                             
gradio==3.35.2                                                                                                                                                                                
gradio_client==0.2.9                                                                                                                                                                          
grpcio==1.66.1                                                                                                                                                                                
h11==0.14.0                                                                                                                                                                                   
hjson==3.1.0                                                                                                                                                                                  
httpcore==1.0.5                                                                                                                                                                               
httpx==0.27.2                                                                                                                                                                                 
huggingface-hub==0.24.6         
idna==3.8                                                                                                                                                                                     
imageio==2.35.1                                
importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1724187233579/work                                                                                  
importlib_resources==6.4.4                                                                                                                                                                    
inflect==7.3.1                                                                                                                                                                                
ipdb==0.13.13                                                                                                                                                                                 
ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1719845459717/work                                                                                                    
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1701831663892/work         
ipywidgets @ file:///home/conda/feedstock_root/build_artifacts/ipywidgets_1724334859652/work                                                                                                  
isoduration==20.11.0                                                                           
jaraco.context==5.3.0                                                                                                                                                                         
jaraco.functools==4.0.1       
jaraco.text==3.12.1                                                                                                                                                                           
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work               
Jinja2==3.1.4                                                                                                                                                                                 
joblib==1.4.2         
json5==0.9.25                                                                                                                                                                                 
jsonpointer==3.0.0                                                                             
jsonschema==4.23.0                                                                             
jsonschema-specifications==2023.12.1
jupyter==1.1.1   
jupyter-console==6.6.3
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1716472197302/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1710257447442/work                                                                                              
jupyter_server==2.14.2
jupyter_server_terminals==0.5.3
jupyterlab==4.2.5
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
jupyterlab_widgets @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_widgets_1724331334887/work                                                                                  
kiwisolver==1.4.5                                                                                                                                                                             
latex2mathml==3.77.0                                                                                                                                                                          
linkify-it-py==2.0.3                                                                                                                                                                          
lit==15.0.7                                                                                                                                                                                   
lmdeploy==0.5.3                                                                                                                                                                               
Markdown==3.7                                                                                                                                                                                 
markdown-it-py==2.2.0                                                                                                                                                                         
markdown2==2.5.0                                                                                                                                                                              
MarkupSafe==2.1.5                                                                                                                                                                             
matplotlib==3.9.2                                                                                                                                                                             
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1713250518406/work                                                                                    
mdit-py-plugins==0.3.3                                                                                                                                                                        
mdurl==0.1.2                                                                                                                                                                                  
mistune==3.0.2                                                                                                                                                                                
mmcls==0.25.0                                                                                                                                                                                 
mmcv==2.2.0                     
mmcv-full==1.6.2                                                                                                                                                                              
mmengine==0.10.5                               
mmengine-lite==0.10.4                                                                                                                                                                         
mmsegmentation==0.30.0                                                                                                                                                                        
model-index==0.1.11                                                                                                                                                                           
more-itertools==10.3.0                                                                                                                                                                        
mpmath==1.3.0                                                                                                                                                                                 
multidict==6.0.5                                                                               
narwhals==1.6.0                                                                                                                                                                               
nbclient==0.10.0                                                                               
nbconvert==7.16.4                                                                                                                                                                             
nbformat==5.10.4              
nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work                                                                                              
networkx==3.2.1                                                                                
ninja==1.11.1.1                                                                                                                                                                               
notebook==7.2.2       
notebook_shim==0.2.4                                                                                                                                                                          
numpy==1.26.4                                                                                  
nvidia-cublas-cu12==12.1.3.1                                                                   
nvidia-cuda-cupti-cu12==12.1.105    
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106                                                                                                                                                                
nvidia-cusolver-cu12==11.4.5.107                                                                                                                                                              
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5       
nvidia-nvjitlink-cu12==12.6.68
nvidia-nvtx-cu12==12.1.105
opencv-python==4.10.0.84
opencv-python-headless==4.10.0.84                                                                                                                                                             
opendatalab==0.0.10                                                                                                                                                                           
openmim==0.3.9                                                                                                                                                                                
openxlab==0.0.11                                                                                                                                                                              
ordered-set==4.1.0                                                                                                                                                                            
orjson==3.10.7                                                                                                                                                                                
overrides==7.7.0                                                                                                                                                                              
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1718189413536/work                                                                                                    
pandas==2.2.2                                                                                                                                                                                 
pandocfilters==1.5.1                                                                                                                                                                          
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1712320355065/work                                                                                                            
peft==0.11.1                                                                                                                                                                                  
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work                                                                                                        
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work                                                                                                
pillow==10.4.0                                                                                                                                                                                
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1715777629804/work                                                                                              
prettytable==3.11.0             
prometheus_client==0.20.0                                                                                                                                                                     
prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1718047967974/work
protobuf==5.28.0                                                                                                                                                                              
psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1719274564771/work                                                                                                          
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl                                                       
pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1721585709575/work                                                                                                    
py-cpuinfo==9.0.0                                                                                                                                                                             
pyarrow==17.0.0                                                                                
pycocoevalcap==1.2                                                                                                                                                                            
pycocotools==2.0.8                                                                             
pycparser==2.22                                                                                                                                                                               
pycryptodome==3.20.0          
pydantic==2.8.2                                                                                                                                                                               
pydantic_core==2.20.1                                                                          
pydeck==0.9.1                                                                                                                                                                                 
pydub==0.25.1         
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1714846767233/work                                                                                                      
pynvml==11.5.3                                                                                 
pyparsing==3.1.4                                                                               
PySocks==1.7.1                      
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1709299778482/work
python-json-logger==2.0.7         
python-multipart==0.0.9    
pytz==2024.1                
PyYAML==6.0.2                                                                                                                                                                                 
pyzmq @ file:///croot/pyzmq_1705605076900/work                                                                                                                                                
referencing==0.35.1             
regex==2024.7.24               
requests==2.32.3              
rfc3339-validator==0.1.4  
rfc3986-validator==0.1.1 
rich==13.8.0                                                                                                                                                                                  
rpds-py==0.20.0                                                                                                                                                                               
safetensors==0.4.4                                                                                                                                                                            
scikit-learn==1.5.1                                                                                                                                                                           
scipy==1.13.1                                                                                                                                                                                 
semantic-version==2.10.0                                                                                                                                                                      
Send2Trash==1.8.3                                                                                                                                                                             
sentencepiece==0.1.99                                                                                                                                                                         
setproctitle==1.3.3                                                                                                                                                                           
shortuuid==1.0.13                                                                                                                                                                             
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work                                                                                                                
smmap==5.0.1                                                                                                                                                                                  
sniffio==1.3.1                                                                                                                                                                                
soupsieve==2.6                                                                                                                                                                                
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work                                                                                                  
starlette==0.38.3                                                                                                                                                                             
streamlit==1.38.0               
streamlit-image-select==0.6.0                                                                                                                                                                 
svgwrite==1.4.3                                                                                                                                                                               
sympy==1.13.2                                                                                                                                                                                 
tabulate==0.9.0                                                                                                                                                                               
tenacity==8.5.0                                                                                                                                                                               
tensorboard==2.17.1                                                                                                                                                                           
tensorboard-data-server==0.7.2                                                                                                                                                                
tensorboardX==2.6.2.2                                                                          
termcolor==2.4.0                                                                                                                                                                              
terminado==0.18.1                                                                              
terminaltables==3.1.10                                                                                                                                                                        
threadpoolctl==3.5.0          
tiktoken==0.7.0                                                                                                                                                                               
timm==0.9.12                                                                                   
tinycss2==1.3.0                                                                                                                                                                               
tokenizers==0.15.1    
toml==0.10.2                                                                                                                                                                                  
tomli==2.0.1                                                                                   
torch==2.3.1                                                                                   
torchaudio==2.3.1+cu121             
torchvision==0.18.1                                                                                                                                                                           
tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1724955920300/work
tqdm==4.66.5               
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1713535121073/work
transformers==4.37.2                                                                                                                                                                          
triton==2.3.1                                                                                                                                                                                 
typeguard==4.3.0                
types-python-dateutil==2.9.0.20240821
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1717802530399/work
tzdata==2024.1            
uc-micro-py==1.0.3  
uri-template==1.3.0                                                                                                                                                                           
urllib3==2.2.2        
uvicorn==0.30.6                                                                                                                                                                               
watchdog==4.0.2                                                                                
wavedrom==2.0.3.post3                                                                          
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work
webcolors==24.8.0                                                                                                                                                                             
webencodings==0.5.1                                                                            
websocket-client==1.8.0    
websockets==13.0.1                                                                             
Werkzeug==3.0.4                                                                                                                                                                               
widgetsnbextension @ file:///home/conda/feedstock_root/build_artifacts/widgetsnbextension_1724331337528/work                                                                                  
yacs==0.1.8                     
yapf==0.40.1                         
yarl==1.9.6                                                                                                                                                                                   
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1724730934107/work

Error traceback

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.                                                                         
[WARNING] gemm_config.in is not found; using default GEMM algo                                                                                                                                
  0%|▏                                                                                                                                                      | 1/721 [00:09<1:59:20,  9.95s/it]
Aborted (core dumped)                                                                                                                                                                         
GPU 0 has no active processes or is corrupted. Restarting associated process.                                                                                                                 
Restarted process on GPU 0: bash -c 'source /usr/local/anaconda3/etc/profile.d/conda.sh && conda activate internvl && cd /data1/ouyangtianjian/InternVL && python /data1/ouyangtianjian/Intern
VL/InternVL_Multi_Images.py'                                                                                                                                                                  
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. No dtype was provided, you should run training or inference using Automatic Mixed-Precision via the `with torch.aut
ocast(device_type='torch_device'):` decorator.                                                                                                                                                
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. No dtype was provided, you should run training or inference using Automatic Mixed-Precision via the `with torch.aut
ocast(device_type='torch_device'):` decorator.                                                                                                                                                
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.                                                                         
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.                                                                         
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.                                                                         
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.                                                                         
[WARNING] gemm_config.in is not found; using default GEMM algo                                                                                                                                
  0%|▏                                                                                                                                                        | 1/720 [00:04<57:50,  4.83s/it]
Aborted (core dumped)                                                                                                                                                                         
GPU 0 has no active processes or is corrupted. Restarting associated process.                                                                                                                 
Restarted process on GPU 0: bash -c 'source /usr/local/anaconda3/etc/profile.d/conda.sh && conda activate internvl && cd /data1/ouyangtianjian/InternVL && python /data1/ouyangtianjian/Intern
VL/InternVL_Multi_Images.py'                                                                                                                                                                  
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. No dtype was provided, you should run training or inference using Automatic Mixed-Precision via the `with torch.aut
ocast(device_type='torch_device'):` decorator.                                                                                                                                                
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. No dtype was provided, you should run training or inference using Automatic Mixed-Precision via the `with torch.aut
ocast(device_type='torch_device'):` decorator.                                                                                                                                                
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.                                                                         
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.                                                                         
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.                                                                         
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.      
[WARNING] gemm_config.in is not found; using default GEMM algo                                                                                                                                
  0%|▏                                                                                                                                                      | 1/719 [00:09<1:50:01,  9.19s/it]
Aborted (core dumped)                                                                                                                                                                         
GPU 0 has no active processes or is corrupted. Restarting associated process.                                                                                                                 
Restarted process on GPU 0: bash -c 'source /usr/local/anaconda3/etc/profile.d/conda.sh && conda activate internvl && cd /data1/ouyangtianjian/InternVL && python /data1/ouyangtianjian/Intern
VL/InternVL_Multi_Images.py'
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. No dtype was provided, you should run training or inference using Automatic Mixed-Precision via the `with torch.aut
ocast(device_type='torch_device'):` decorator.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. No dtype was provided, you should run training or inference using Automatic Mixed-Precision via the `with torch.aut
ocast(device_type='torch_device'):` decorator.
^CTraceback (most recent call last):
  File "/data1/ouyangtianjian/InternVL/InternVL_Multi_Images_AutoRecover.py", line 58, in <module>
    monitor_gpus(gpu_processes)
  File "/data1/ouyangtianjian/InternVL/InternVL_Multi_Images_AutoRecover.py", line 55, in monitor_gpus
    time.sleep(check_interval)

Sep 25 '24 13:09 TJ-Ouyang