Hello, I downloaded the decapoda-research/llama-7b-hf model and ran run_chatbot.sh, which displayed the following error? How can I resolve it?

Open hhllxx1121 opened this issue 2 years ago • 1 comments

[2023-04-07 13:57:13,994] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. Detected CUDA_VISIBLE_DEVICES=0: setting --include=localhost:0 [2023-04-07 13:57:14,006] [INFO] [runner.py:550:main] cmd = /opt/miniconda3/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None examples/chatbot.py --deepspeed configs/ds_config_chatbot.json --model_name_or_path /home/jovyan/exp_2273/llama-7b-hf [2023-04-07 13:57:15,537] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]} [2023-04-07 13:57:15,537] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0 [2023-04-07 13:57:15,537] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]}) [2023-04-07 13:57:15,537] [INFO] [launch.py:162:main] dist_world_size=1 [2023-04-07 13:57:15,537] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0 [2023-04-07 13:57:56,585] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1808 [2023-04-07 13:57:56,586] [ERROR] [launch.py:324:sigkill_handler] ['/opt/miniconda3/bin/python', '-u', 'examples/chatbot.py', '--local_rank=0', '--deepspeed', 'configs/ds_config_chatbot.json', '--model_name_or_path', '/home/jovyan/exp_2273/llama-7b-hf'] exits with return code = -9

Apr 07 '23 05:04 hhllxx1121

Thanks for your interest in LMFlow! Looks like the process is automatically killed by the operating system. This often happens when the program consumes too much RAM (i.e. CPU memory). In that case, you may move to a server with larger RAM or try a smaller model that requires less RAM. Thanks 😄

Apr 07 '23 10:04 research4pan