tutel icon indicating copy to clipboard operation
tutel copied to clipboard

does it support Qwen3-235B-A22B-Instruct-2507-FP4

Open squirrelfish opened this issue 5 months ago • 8 comments

Danucore / Qwen3-235B-A22B-Instruct-2507-FP4

squirrelfish avatar Jul 24 '25 03:07 squirrelfish

Yes since docker image 20250723:

# For A100/A800/H100/H800/H20/H200 (80G x 8):
docker run -it --rm --ipc=host --net=host --shm-size=8g --ulimit memlock=-1 \
      --ulimit stack=67108864 --gpus=all -v /:/host -w /host$(pwd) \
      tutelgroup/deepseek-671b:a100x8-chat-20250723 \
        --try_path ./Danucore/Qwen3-235B-A22B-Instruct-2507-FP4 \
        --try_path ./Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 \
        --serve --listen_port 8000 \
        --prompt "Calculate the indefinite integral of 1/sin(x) + x"

ghostplant avatar Jul 24 '25 05:07 ghostplant

Since this model is less than 200GB in size, using 8 GPU cards would be a waste. How can we run it with only 4 cards?

squirrelfish avatar Jul 25 '25 01:07 squirrelfish

You can run the docker commands with: docker run -e LOCAL_SIZE=4 -it --rm --net=host .. to reduce the GPU counts. Setting LOCAL_SIZE=2 should also work for A100(80G) x 2. However, fewer GPUs will have less throughput.

ghostplant avatar Jul 25 '25 02:07 ghostplant

Yes since docker image 20250723:

For A100/A800/H100/H800/H20/H200 (80G x 8):

docker run -it --rm --ipc=host --net=host --shm-size=8g --ulimit memlock=-1
--ulimit stack=67108864 --gpus=all -v /:/host -w /host$(pwd)
tutelgroup/deepseek-671b:a100x8-chat-20250723
--try_path ./Danucore/Qwen3-235B-A22B-Instruct-2507-FP4
--try_path ./Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
--serve --listen_port 8000
--prompt "Calculate the indefinite integral of 1/sin(x) + x"

After the model was loaded, an error was raised. It seems that it wants to get the dataset file https://huggingface.co/datasets/ghostplant/data-collections/resolve/main/qwen3_cos_sin.npy . My environment does not allow connection to the public network. How should I handle this?

Image Image

squirrelfish avatar Jul 25 '25 08:07 squirrelfish

Got it. Please re-pull the image to skip the downloading procedure:

docker pull tutelgroup/deepseek-671b:a100x8-chat-20250723

ghostplant avatar Jul 25 '25 10:07 ghostplant

Got it. Please re-pull the image to skip the downloading procedure:

docker pull tutelgroup/deepseek-671b:a100x8-chat-20250723

Thank you. It works very well, but there are still some minor issues. The output does not meet the compatibility requirements of the OpenAI API.

the code ` import openai

client = openai.OpenAI(base_url="http://127.0.0.1:8000/v1", api_key="hello")

response = client.chat.completions.create( model="qwen3-235b-f4", messages=[{"role": "user", "content": "who are you?"}], ) print(response.choices[0].message.content) `

in tutel, it prints: user who are you?<|im_end|> <|im_start|>assistant Hello! I'm Qwen, a large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I can assist you with answering questions, writing, logical reasoning, programming, and more. I'm here to help with various tasks—feel free to ask me anything! 😊<|im_end|>

and in vllm, it should be: Hello! I'm Qwen,...

squirrelfish avatar Jul 25 '25 14:07 squirrelfish

Yes.. it prints the question prompts as well for now.

ghostplant avatar Jul 25 '25 16:07 ghostplant

@squirrelfish The next image version has removed the prefill strings in response:

docker run -e LOCAL_SIZE=8 -it --rm --ipc=host --net=host --shm-size=8g \
      --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all -v /:/host -w /host$(pwd) \
      tutelgroup/deepseek-671b:a100x8-chat-20250801 --serve=webui --listen_port 8000 \
        --prompt "Calculate the indefinite integral of 1/sin(x) + x" \
        --try_path ./moonshotai/Kimi-K2-Instruct \
        --try_path ./deepseek-ai/DeepSeek-R1-0528 \
        --try_path ./nvidia/DeepSeek-R1-FP4 \
        --try_path ./deepseek-ai/DeepSeek-R1 \
        --try_path ./deepseek-ai/DeepSeek-V3-0324 \
        --try_path ./deepseek-ai/DeepSeek-Prover-V2-671B \
        --try_path ./Danucore/Qwen3-235B-A22B-Instruct-2507-FP4 \
        --try_path ./Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 \
        --try_path ./Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
        --try_path ./Qwen/Qwen3-30B-A3B-FP8 \
        --try_path ./Qwen/Qwen3-32B-FP8 \
        --try_path ./Qwen/Qwen3-32B \
        --try_path ./Qwen/Qwen3-0.6B

It starts to include WebUI, so you can login http://0.0.0.0:8000 with your browser.

ghostplant avatar Jul 30 '25 15:07 ghostplant