does it support Qwen3-235B-A22B-Instruct-2507-FP4
Yes since docker image 20250723:
# For A100/A800/H100/H800/H20/H200 (80G x 8):
docker run -it --rm --ipc=host --net=host --shm-size=8g --ulimit memlock=-1 \
--ulimit stack=67108864 --gpus=all -v /:/host -w /host$(pwd) \
tutelgroup/deepseek-671b:a100x8-chat-20250723 \
--try_path ./Danucore/Qwen3-235B-A22B-Instruct-2507-FP4 \
--try_path ./Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 \
--serve --listen_port 8000 \
--prompt "Calculate the indefinite integral of 1/sin(x) + x"
Since this model is less than 200GB in size, using 8 GPU cards would be a waste. How can we run it with only 4 cards?
You can run the docker commands with: docker run -e LOCAL_SIZE=4 -it --rm --net=host .. to reduce the GPU counts. Setting LOCAL_SIZE=2 should also work for A100(80G) x 2. However, fewer GPUs will have less throughput.
Yes since docker image 20250723:
For A100/A800/H100/H800/H20/H200 (80G x 8):
docker run -it --rm --ipc=host --net=host --shm-size=8g --ulimit memlock=-1
--ulimit stack=67108864 --gpus=all -v /:/host -w /host$(pwd)
tutelgroup/deepseek-671b:a100x8-chat-20250723
--try_path ./Danucore/Qwen3-235B-A22B-Instruct-2507-FP4
--try_path ./Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
--serve --listen_port 8000
--prompt "Calculate the indefinite integral of 1/sin(x) + x"
After the model was loaded, an error was raised. It seems that it wants to get the dataset file https://huggingface.co/datasets/ghostplant/data-collections/resolve/main/qwen3_cos_sin.npy . My environment does not allow connection to the public network. How should I handle this?
Got it. Please re-pull the image to skip the downloading procedure:
docker pull tutelgroup/deepseek-671b:a100x8-chat-20250723
Got it. Please re-pull the image to skip the downloading procedure:
docker pull tutelgroup/deepseek-671b:a100x8-chat-20250723
Thank you. It works very well, but there are still some minor issues. The output does not meet the compatibility requirements of the OpenAI API.
the code ` import openai
client = openai.OpenAI(base_url="http://127.0.0.1:8000/v1", api_key="hello")
response = client.chat.completions.create( model="qwen3-235b-f4", messages=[{"role": "user", "content": "who are you?"}], ) print(response.choices[0].message.content) `
in tutel, it prints:
user who are you?<|im_end|> <|im_start|>assistant Hello! I'm Qwen, a large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I can assist you with answering questions, writing, logical reasoning, programming, and more. I'm here to help with various tasks—feel free to ask me anything! 😊<|im_end|>
and in vllm, it should be:
Hello! I'm Qwen,...
Yes.. it prints the question prompts as well for now.
@squirrelfish The next image version has removed the prefill strings in response:
docker run -e LOCAL_SIZE=8 -it --rm --ipc=host --net=host --shm-size=8g \
--ulimit memlock=-1 --ulimit stack=67108864 --gpus=all -v /:/host -w /host$(pwd) \
tutelgroup/deepseek-671b:a100x8-chat-20250801 --serve=webui --listen_port 8000 \
--prompt "Calculate the indefinite integral of 1/sin(x) + x" \
--try_path ./moonshotai/Kimi-K2-Instruct \
--try_path ./deepseek-ai/DeepSeek-R1-0528 \
--try_path ./nvidia/DeepSeek-R1-FP4 \
--try_path ./deepseek-ai/DeepSeek-R1 \
--try_path ./deepseek-ai/DeepSeek-V3-0324 \
--try_path ./deepseek-ai/DeepSeek-Prover-V2-671B \
--try_path ./Danucore/Qwen3-235B-A22B-Instruct-2507-FP4 \
--try_path ./Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 \
--try_path ./Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
--try_path ./Qwen/Qwen3-30B-A3B-FP8 \
--try_path ./Qwen/Qwen3-32B-FP8 \
--try_path ./Qwen/Qwen3-32B \
--try_path ./Qwen/Qwen3-0.6B
It starts to include WebUI, so you can login http://0.0.0.0:8000 with your browser.