aibrix
aibrix copied to clipboard
Engine exception asyncio.exceptions.CancelledError in benchmark
🐛 Describe the bug
- prefillers throw the following exception.
- The prefillers and decoders are still able to work in the 2nd test.
Steps to Reproduce
- Deploy vllm 2p2d tp2 for Qwen3-32B.
- Allocate 1 RDMA nic per pod.
- Schedule all pods onto the same node.
- Conduct benchmarking:
TOKENIZER="/data01/models/Qwen3-32B"
MODEL="qwen3-32b"
HOST="${LB_EXTERNAL_IP}"
PORT="80"
REQ_RATE=1.0
INPUT_LEN=8000
OUTPUT_LEN=200
python3 benchmark_serving.py --port $PORT --host $HOST --seed $(date +%s) \
--model $MODEL \
--tokenizer $TOKENIZER \
--dataset-name random --random-input-len ${INPUT_LEN} --random-output-len ${OUTPUT_LEN} \
--num-prompts 200 --burstiness 100 --request-rate ${REQ_RATE} --metric-percentiles 95 \
--backend openai-chat --endpoint /v1/chat/completions --routing-strategy "pd" --ignore-eos
Expected behavior
No error reported during benchmarking.
Environment
- AIBrix 0.5.0
- VKE
- Node conf: 8GPUs, 8RDMA