EnergonAI
EnergonAI copied to clipboard
Can not start the Bloom server
Infomation
V100
CUDA 11.3
transformers==4.23.1
torch==1.12.0
colossalai==0.2.5
energonai==0.0.1+torch1.12cu11.3
running for bloom-560m & bloom-7b1
Question
When I try to start the bloom server using the examples in this link, I find it stops in this scenario.
I do not meet any errors and I can not send request to http://[ip]:[host]//generation.
Is there any other information? The normal startup situation should be as shown in the figure below
I have meet the same problem.I start bloom540 with docker:hpcaitech/energon-ai:latest Infomation 4090 CUDA 11.3 transformers 4.24.0 colossalai 0.2.0+torch1.12cu11.3 energonai 0.0.1+torch1.12cu11.3 torch 1.12.1
running for bloom-560m & bloom-7b1
The application is hang.No other logs is print
I have meet the same problem.I start bloom540 with docker:hpcaitech/energon-ai:latest Infomation 4090 CUDA 11.3 transformers 4.24.0 colossalai 0.2.0+torch1.12cu11.3 energonai 0.0.1+torch1.12cu11.3 torch 1.12.1
running for bloom-560m & bloom-7b1 The application is hang.No other logs is print
comment random_init in run.sh .now it can be started python server.py --tp ${GPU_NUM} --name ${DATASET} --dtype "int8" --max_batch_size 4 --random_model_size "560m" #--random_init False