I tried to make the inference on A30, while an error occurred: RuntimeError: CUDA out of memory. How to inference on multi cards?
Hi! We have released the code for model parallel inference. We also suggest using a quantized version that requires significantly lower memory. Just run the following script:
# On a single GPU (with more than 27GB RAM)
bash ./scripts/test_inference.sh <GPU_ID> ./tests/test_prompt.txt
# With quantization (with more than 15GB RAM)
bash ./scripts/test_inference_quantized.sh <GPU_ID> ./tests/test_prompt.txt
# On multiple GPUs (with more than 6GB RAM, need to first convert ckpt to MP_SIZE partitions)
bash ./scripts/convert_ckpt_parallel.sh <LOAD_CKPT_PATH> <SAVE_CKPT_PATH> <MP_SIZE>
bash ./scripts/test_inference_parallel.sh <MP_SIZE> ./tests/test_prompt.txt
完整的报错
ARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
libibverbs not available, ibv_fork_init skipped
libibverbs not available, ibv_fork_init skipped
W20230407 00:33:39.174242 1141 rpc_client.cpp:190] LoadServer 127.0.0.1 Failed at 0 times error_code 14 error_message Connection reset by peer
E0407 00:33:39.190535757 1140 server_chttp2.cc:40] {"created":"@1680827619.190465338","description":"No address added out of total 1 resolved","file":"/home/ci-user/manylinux-cache-dir/release/cu117/build/grpc/src/grpc/src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":395,"referenced_errors":[{"created":"@1680827619.190463829","description":"Failed to add any wildcard listeners","file":"/home/ci-user/manylinux-cache-dir/release/cu117/build/grpc/src/grpc/src/core/lib/iomgr/tcp_server_posix.cc","file_line":342,"referenced_errors":[{"created":"@1680827619.190445769","description":"Address family not supported by protocol","errno":97,"file":"/home/ci-user/manylinux-cache-dir/release/cu117/build/grpc/src/grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":420,"os_error":"Address family not supported by protocol","syscall":"socket","target_address":"[::]:29500"},{"created":"@1680827619.190463308","description":"Unable to configure socket","fd":18,"file":"/home/ci-user/manylinux-cache-dir/release/cu117/build/grpc/src/grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":216,"referenced_errors":[{"created":"@1680827619.190459051","description":"Address already in use","errno":98,"file":"/home/ci-user/manylinux-cache-dir/release/cu117/build/grpc/src/grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc","file_line":189,"os_error":"Address already in use","syscall":"bind"}]}]}]}
F20230407 00:33:39.190577 1140 rank_info_bootstrap_server.cpp:46] Check failed: p == port() (29500 vs. 0) Port 29500 is unavailable
*** Check failure stack trace: ***
@ 0x7fb9794709ca google::LogMessage::Fail()
@ 0x7fb979470cb2 google::LogMessage::SendToLog()
@ 0x7fb979470537 google::LogMessage::Flush()
@ 0x7fb9794730a9 google::LogMessageFatal::~LogMessageFatal()
@ 0x7fb96ed95d40 oneflow::RankInfoBootstrapServer::RankInfoBootstrapServer()
@ 0x7fb96ed7fd15 oneflow::RankInfoCtrlBootstrap::RankInfoCtrlBootstrap()
@ 0x7fb97345f05b oneflow::GrpcRpcManager::Bootstrap()
@ 0x7fb972bf91c1 oneflow::EnvGlobalObjectsScope::Init()
@ 0x7fb972bfbc94 oneflow::EnvGlobalObjectsScope::EnvGlobalObjectsScope()
@ 0x7fba1b873af9 (unknown)
@ 0x7fba1b80fe0d (unknown)
@ 0x5649a0e70e14 cfunction_call
@ 0x5649a0e2acaf _PyObject_MakeTpCall
@ 0x5649a0da505b method_vectorcall.cold.2469
@ 0x5649a0e34a7a _PyObject_Call
@ 0x5649a0d9aad9 slot_tp_init.cold.2212
@ 0x5649a0e45e9b type_call
@ 0x7fbac4339bf9 pybind11_meta_call
@ 0x5649a0e2acaf _PyObject_MakeTpCall
@ 0x5649a0ec8d89 _PyEval_EvalFrameDefault
@ 0x5649a0e86284 _PyFunction_Vectorcall
@ 0x5649a0dee755 _PyEval_EvalFrameDefault.cold.2984
@ 0x5649a0e86284 _PyFunction_Vectorcall
@ 0x5649a0decae6 _PyEval_EvalFrameDefault.cold.2984
@ 0x5649a0e86284 _PyFunction_Vectorcall
@ 0x5649a0e70eca _PyObject_FastCallDictTstate
@ 0x5649a0e7ab79 slot_tp_init
@ 0x5649a0e2ad5f _PyObject_MakeTpCall
@ 0x5649a0ec480a _PyEval_EvalFrameDefault
@ 0x5649a0e86284 _PyFunction_Vectorcall
@ 0x5649a0dee755 _PyEval_EvalFrameDefault.cold.2984
@ 0x5649a0e85663 _PyEval_EvalCode
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1141 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 1140) of binary: /usr/local/miniconda/bin/python
Traceback (most recent call last):
File "/usr/local/miniconda/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/miniconda/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/usr/local/miniconda/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/usr/local/miniconda/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/usr/local/miniconda/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/miniconda/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
/CodeGeeX/tests/test_inference_megatron.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2023-04-07_00:33:41
host : 2c00174f74e6
rank : 0 (local_rank: 0)
exitcode : -6 (pid: 1140)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 1140