mini-sglang issues

[docs] fix small typo

Fixing a small typo. Thanks for the great read!

Suggestion: Change github about page to: A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems

Suggestion: Change github about page to: A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems Currently this field is empty.

DanCard

[docs] Clarify multi-scheduler acknowledgment behavior

The previous comment suggested only 1 scheduler process existed, which was misleading. In reality, world_size scheduler processes are spawned (one per TP rank/GPU), but only the primary rank sends an...

louiswang524

feat: Implement request cancellation

### Description This PR implements a complete request cancellation mechanism to prevent GPU resource waste when clients disconnect. It addresses the issue described in #[15]. ### Verification before after fix...

Zaire404

Request-scoped Torch profiler via profile flag.

1

**Summary** (only 94 lines of code) Adds an opt-in, per-request profiling path: clients can send `"profile": true` and mini-sglang will start a `torch.profiler` session for that request, then export a...

AdamLouly

[Bug] Backend continues generation after client disconnects & Missing cancellation support

3

### Describe the bug When a client disconnects (e.g., Ctrl+C via curl), the backend (Scheduler/GPU) continues to generate tokens until `max_seq_len` is reached. This wastes GPU resources. ### Reproduction Use...

Zaire404

CPU support

As a simple tutorial, can Mini-SGlang support CPU-only mode? I want my macbook can run it.

hnyls2002

MOE Support

1

Hi, I am very impressed by the project and have learned a lot! Just curious whether this minimal implementation plans to support MOE architectures recently? Thank you!

ziqipang

nvcc version do not support C++20 std::source_location completely

1

python -m minisgl --model "../../Qwen/Qwen3-0.6B" ` [2025-12-20|08:57:37] INFO Parsed arguments: ServerArgs(model_path='../../Qwen/Qwen3-0.6B', tp_info=DistributedInfo(rank=0, size=1), dtype=torch.bfloat16, max_running_req=256, attention_backend='auto', cuda_graph_bs=None, cuda_graph_max_bs=None, page_size=1, memory_ratio=0.9, distributed_timeout=60.0, use_dummy_weight=False, use_pynccl=True, max_seq_len_override=None, num_page_override=None, max_extend_tokens=8192, cache_type='radix', offline_mode=False, _unique_suffix='.pid=2657', server_host='127.0.0.1',...

wwwsctvcom

OPList state dict also use concat_prefix

Refer to the writing style of BaseOP and reuse _concat_prefix in OPList

ITerydh

mini-sglang
mini-sglang copied to clipboard

Metadata

[docs] fix small typo

Suggestion: Change github about page to: A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems

[docs] Clarify multi-scheduler acknowledgment behavior

feat: Implement request cancellation

Request-scoped Torch profiler via profile flag.

[Bug] Backend continues generation after client disconnects & Missing cancellation support

CPU support

MOE Support

nvcc version do not support C++20 std::source_location completely

OPList state dict also use concat_prefix

← Metadata

Owner

Metadata

mini-sglang mini-sglang copied to clipboard

Metadata

← Metadata

Owner

Metadata

mini-sglang
mini-sglang copied to clipboard