ghostplant comments

Results 272 comments of


                                            ghostplant

does it support Qwen3-235B-A22B-Instruct-2507-FP4

You can run the docker commands with: `docker run -e LOCAL_SIZE=4 -it --rm --net=host ..` to reduce the GPU counts. Setting `LOCAL_SIZE=2` should also work for A100(80G) x 2. However,...

does it support Qwen3-235B-A22B-Instruct-2507-FP4

Got it. Please re-pull the image to skip the downloading procedure: ```sh docker pull tutelgroup/deepseek-671b:a100x8-chat-20250723 ```

does it support Qwen3-235B-A22B-Instruct-2507-FP4

Yes.. it prints the question prompts as well for now.

does it support Qwen3-235B-A22B-Instruct-2507-FP4

@squirrelfish The next image version has removed the prefill strings in response: ```sh docker run -e LOCAL_SIZE=8 -it --rm --ipc=host --net=host --shm-size=8g \ --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all -v /:/host...

Tutel as an MoE backend in Nanotron for Qwen3-MoE 15B (128 experts, top-k=8)

May I know if "Nanotron" is still active? I try deploying it for Tutel integration, but the nanotron fails even under uv environment. Is there any docker environment that is...

BUG: system.init_data_model_parallel() Prevents Nsight Systems (nsys) from Tracing GPU Hardware in Distributed Mode

Thank you. This command doesn't seem to get into issues: `nsys profile --trace-fork-before-exec=true -o tutel_fail.nsys python3 -m torch.distributed.run --nproc-per-node=2 -m tutel.examples.helloworld` Tutel's initialization still uses torch's naive distributed initialization, but...

ghostplant

does it support Qwen3-235B-A22B-Instruct-2507-FP4

does it support Qwen3-235B-A22B-Instruct-2507-FP4

does it support Qwen3-235B-A22B-Instruct-2507-FP4

does it support Qwen3-235B-A22B-Instruct-2507-FP4

Tutel as an MoE backend in Nanotron for Qwen3-MoE 15B (128 experts, top-k=8)

BUG: system.init_data_model_parallel() Prevents Nsight Systems (nsys) from Tracing GPU Hardware in Distributed Mode

Examples integrated with Megatron-LM

Login Open Web UI

Support for Blackwell?

Support for Blackwell?