mini-sglang icon indicating copy to clipboard operation
mini-sglang copied to clipboard

[Feature] Add top_p and top_k sampling support

Open louiswang524 opened this issue 1 week ago • 2 comments

Summary Implements top_k and top_p (nucleus) sampling to enable diverse and controlled text generation in Mini-SGLang. This addresses the TODO items in api_server.py for supporting additional sampling parameters and brings Mini-SGLang's sampling capabilities in line with modern LLM serving frameworks.

Top P only

  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [{"role": "user", "content": "Write a creative story"}],
    "temperature": 0.8,
    "top_p": 0.9,
    "max_tokens": 200
  }'

Top K only

  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [{"role": "user", "content": "Explain quantum physics"}],
    "temperature": 0.7,
    "top_k": 50,
    "max_tokens": 150
  }'

Top K and Top P:

  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [{"role": "user", "content": "Generate a poem"}],
    "temperature": 0.9,
    "top_k": 100,
    "top_p": 0.95,
    "max_tokens": 100
  }'

Tests:

plugins: cov-7.0.0
collecting ... collected 21 items

tests/engine/test_sampling.py::TestTopKSampling::test_top_k_basic PASSED [  4%]
tests/engine/test_sampling.py::TestTopKSampling::test_top_k_single_token PASSED [  9%]
tests/engine/test_sampling.py::TestTopKSampling::test_top_k_exceeds_vocab PASSED [ 14%]
tests/engine/test_sampling.py::TestTopKSampling::test_top_k_batched PASSED [ 19%]
tests/engine/test_sampling.py::TestTopKSampling::test_top_k_uniform_distribution PASSED [ 23%]
tests/engine/test_sampling.py::TestTopPSampling::test_top_p_basic PASSED [ 28%]
tests/engine/test_sampling.py::TestTopPSampling::test_top_p_very_low PASSED [ 33%]
tests/engine/test_sampling.py::TestTopPSampling::test_top_p_all_tokens PASSED [ 38%]
tests/engine/test_sampling.py::TestTopPSampling::test_top_p_batched PASSED [ 42%]
tests/engine/test_sampling.py::TestTopPSampling::test_top_p_uniform_distribution PASSED [ 47%]
tests/engine/test_sampling.py::TestCombinedSampling::test_top_k_then_top_p PASSED [ 52%]
tests/engine/test_sampling.py::TestCombinedSampling::test_top_k_more_restrictive PASSED [ 57%]
tests/engine/test_sampling.py::TestCombinedSampling::test_top_p_more_restrictive PASSED [ 61%]
tests/engine/test_sampling.py::TestSamplingEdgeCases::test_single_token_vocab PASSED [ 66%]
tests/engine/test_sampling.py::TestSamplingEdgeCases::test_large_batch PASSED [ 71%]
tests/engine/test_sampling.py::TestSamplingEdgeCases::test_near_zero_probabilities PASSED [ 76%]
tests/engine/test_sampling.py::TestSamplingEdgeCases::test_identical_probabilities PASSED [ 80%]
tests/engine/test_sampling.py::TestSamplingProperties::test_top_k_idempotent PASSED [ 85%]
tests/engine/test_sampling.py::TestSamplingProperties::test_top_p_idempotent PASSED [ 90%]
tests/engine/test_sampling.py::TestSamplingProperties::test_probability_mass_conservation PASSED [ 95%]
tests/engine/test_sampling.py::TestSamplingProperties::test_monotonicity_top_k PASSED [100%]

=============================== tests coverage ================================
______________ coverage: platform win32, python 3.12.10-final-0 _______________

Name                                     Stmts   Miss  Cover   Missing
----------------------------------------------------------------------
python\minisgl\attention\__init__.py        34     25    26%   18-23, 32-56
python\minisgl\attention\base.py            24      9    62%   45-46, 51-52, 55-56, 59, 62, 65
python\minisgl\core.py                      90     47    48%   35-46, 50, 54, 57-58, 61, 64, 67, 76-83, 87, 91, 95, 99, 111-121, 124-125, 128-129, 133-137, 141-142, 150-151, 155-156
python\minisgl\distributed\__init__.py       3      0   100%
python\minisgl\distributed\impl.py          53     30    43%   27-31, 34-41, 49-50, 53-60, 67, 70, 79-90, 97
python\minisgl\distributed\info.py          22      9    59%   12, 15, 23-25, 29-31, 35
python\minisgl\engine\__init__.py            4      0   100%
python\minisgl\engine\config.py             41      8    80%   34, 38-40, 44-46, 50, 54
python\minisgl\engine\engine.py            111     82    26%   29, 33, 38-101, 115-140, 143-149, 155-173, 177-194, 197-211, 214-216
python\minisgl\engine\graph.py              89     71    20%   24-37, 41, 45, 62-131, 134, 137-141, 145-146, 149-155
python\minisgl\engine\sample.py             59     22    63%   24-55, 60-63, 72-82
python\minisgl\kvcache\__init__.py          19     13    32%   28-41, 45-54
python\minisgl\kvcache\base.py              37      1    97%   61
python\minisgl\layers\__init__.py            8      0   100%
python\minisgl\layers\activation.py          6      2    67%   10-12
python\minisgl\layers\attention.py          34     23    32%   29-45, 48-59
python\minisgl\layers\base.py               56     38    32%   12, 20-30, 39-52, 57, 66-68, 71-72, 80-81, 84-87, 96-99
python\minisgl\layers\embedding.py          64     48    25%   20-30, 33-41, 53-56, 65-74, 82-84, 87-108
python\minisgl\layers\linear.py             62     42    32%   24-29, 32, 43-47, 59-67, 72-79, 82-85, 95-100, 103-106
python\minisgl\layers\norm.py               25     15    40%   10-14, 17, 20, 25-30, 35-38
python\minisgl\layers\rotary.py             59     44    25%   21-37, 45-52, 62-90, 98, 109-119
python\minisgl\models\__init__.py           14      8    43%   9-19
python\minisgl\models\base.py               10      3    70%   18-20
python\minisgl\models\config.py             30      4    87%   34-37
python\minisgl\models\weight.py             73     58    21%   17, 21-49, 53-75, 79-105
python\minisgl\utils\__init__.py             7      0   100%
python\minisgl\utils\arch.py                19     11    42%   9-14, 18-21, 25, 29
python\minisgl\utils\hf.py                  10      4    60%   12-14, 19-20
python\minisgl\utils\logger.py              57     19    67%   40, 46-47, 67-87, 104-110
python\minisgl\utils\misc.py                21     13    38%   6-17, 22-23, 28, 33
python\minisgl\utils\mp.py                  89     54    39%   19-22, 25-26, 29-30, 40-43, 46-47, 50-51, 61-64, 67-68, 71, 74, 77, 80-81, 91-94, 97-98, 101-102, 112-115, 118, 121-122, 125-126, 136-140, 143-144, 147, 150-151
python\minisgl\utils\torch_utils.py         11      6    45%   12-19
----------------------------------------------------------------------
TOTAL                                     1241    709    43%
Coverage HTML written to dir htmlcov
============================= 21 passed in 36.70s =============================```

louiswang524 avatar Dec 20 '25 23:12 louiswang524

@DarkSharpness any feedback on this change?

louiswang524 avatar Dec 22 '25 19:12 louiswang524

@louiswang524 Thanks. I will look into it tommorrow. Sorry I'm not very familiar with sampling so I need some more time.

BTW, personally I would prefer using some flashinfer's implementation for better performance, but the torch implemenation looks fine to me.

DarkSharpness avatar Dec 22 '25 19:12 DarkSharpness