[Feature] Add top_p and top_k sampling support
Summary Implements top_k and top_p (nucleus) sampling to enable diverse and controlled text generation in Mini-SGLang. This addresses the TODO items in api_server.py for supporting additional sampling parameters and brings Mini-SGLang's sampling capabilities in line with modern LLM serving frameworks.
Top P only
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [{"role": "user", "content": "Write a creative story"}],
"temperature": 0.8,
"top_p": 0.9,
"max_tokens": 200
}'
Top K only
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [{"role": "user", "content": "Explain quantum physics"}],
"temperature": 0.7,
"top_k": 50,
"max_tokens": 150
}'
Top K and Top P:
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [{"role": "user", "content": "Generate a poem"}],
"temperature": 0.9,
"top_k": 100,
"top_p": 0.95,
"max_tokens": 100
}'
Tests:
plugins: cov-7.0.0
collecting ... collected 21 items
tests/engine/test_sampling.py::TestTopKSampling::test_top_k_basic PASSED [ 4%]
tests/engine/test_sampling.py::TestTopKSampling::test_top_k_single_token PASSED [ 9%]
tests/engine/test_sampling.py::TestTopKSampling::test_top_k_exceeds_vocab PASSED [ 14%]
tests/engine/test_sampling.py::TestTopKSampling::test_top_k_batched PASSED [ 19%]
tests/engine/test_sampling.py::TestTopKSampling::test_top_k_uniform_distribution PASSED [ 23%]
tests/engine/test_sampling.py::TestTopPSampling::test_top_p_basic PASSED [ 28%]
tests/engine/test_sampling.py::TestTopPSampling::test_top_p_very_low PASSED [ 33%]
tests/engine/test_sampling.py::TestTopPSampling::test_top_p_all_tokens PASSED [ 38%]
tests/engine/test_sampling.py::TestTopPSampling::test_top_p_batched PASSED [ 42%]
tests/engine/test_sampling.py::TestTopPSampling::test_top_p_uniform_distribution PASSED [ 47%]
tests/engine/test_sampling.py::TestCombinedSampling::test_top_k_then_top_p PASSED [ 52%]
tests/engine/test_sampling.py::TestCombinedSampling::test_top_k_more_restrictive PASSED [ 57%]
tests/engine/test_sampling.py::TestCombinedSampling::test_top_p_more_restrictive PASSED [ 61%]
tests/engine/test_sampling.py::TestSamplingEdgeCases::test_single_token_vocab PASSED [ 66%]
tests/engine/test_sampling.py::TestSamplingEdgeCases::test_large_batch PASSED [ 71%]
tests/engine/test_sampling.py::TestSamplingEdgeCases::test_near_zero_probabilities PASSED [ 76%]
tests/engine/test_sampling.py::TestSamplingEdgeCases::test_identical_probabilities PASSED [ 80%]
tests/engine/test_sampling.py::TestSamplingProperties::test_top_k_idempotent PASSED [ 85%]
tests/engine/test_sampling.py::TestSamplingProperties::test_top_p_idempotent PASSED [ 90%]
tests/engine/test_sampling.py::TestSamplingProperties::test_probability_mass_conservation PASSED [ 95%]
tests/engine/test_sampling.py::TestSamplingProperties::test_monotonicity_top_k PASSED [100%]
=============================== tests coverage ================================
______________ coverage: platform win32, python 3.12.10-final-0 _______________
Name Stmts Miss Cover Missing
----------------------------------------------------------------------
python\minisgl\attention\__init__.py 34 25 26% 18-23, 32-56
python\minisgl\attention\base.py 24 9 62% 45-46, 51-52, 55-56, 59, 62, 65
python\minisgl\core.py 90 47 48% 35-46, 50, 54, 57-58, 61, 64, 67, 76-83, 87, 91, 95, 99, 111-121, 124-125, 128-129, 133-137, 141-142, 150-151, 155-156
python\minisgl\distributed\__init__.py 3 0 100%
python\minisgl\distributed\impl.py 53 30 43% 27-31, 34-41, 49-50, 53-60, 67, 70, 79-90, 97
python\minisgl\distributed\info.py 22 9 59% 12, 15, 23-25, 29-31, 35
python\minisgl\engine\__init__.py 4 0 100%
python\minisgl\engine\config.py 41 8 80% 34, 38-40, 44-46, 50, 54
python\minisgl\engine\engine.py 111 82 26% 29, 33, 38-101, 115-140, 143-149, 155-173, 177-194, 197-211, 214-216
python\minisgl\engine\graph.py 89 71 20% 24-37, 41, 45, 62-131, 134, 137-141, 145-146, 149-155
python\minisgl\engine\sample.py 59 22 63% 24-55, 60-63, 72-82
python\minisgl\kvcache\__init__.py 19 13 32% 28-41, 45-54
python\minisgl\kvcache\base.py 37 1 97% 61
python\minisgl\layers\__init__.py 8 0 100%
python\minisgl\layers\activation.py 6 2 67% 10-12
python\minisgl\layers\attention.py 34 23 32% 29-45, 48-59
python\minisgl\layers\base.py 56 38 32% 12, 20-30, 39-52, 57, 66-68, 71-72, 80-81, 84-87, 96-99
python\minisgl\layers\embedding.py 64 48 25% 20-30, 33-41, 53-56, 65-74, 82-84, 87-108
python\minisgl\layers\linear.py 62 42 32% 24-29, 32, 43-47, 59-67, 72-79, 82-85, 95-100, 103-106
python\minisgl\layers\norm.py 25 15 40% 10-14, 17, 20, 25-30, 35-38
python\minisgl\layers\rotary.py 59 44 25% 21-37, 45-52, 62-90, 98, 109-119
python\minisgl\models\__init__.py 14 8 43% 9-19
python\minisgl\models\base.py 10 3 70% 18-20
python\minisgl\models\config.py 30 4 87% 34-37
python\minisgl\models\weight.py 73 58 21% 17, 21-49, 53-75, 79-105
python\minisgl\utils\__init__.py 7 0 100%
python\minisgl\utils\arch.py 19 11 42% 9-14, 18-21, 25, 29
python\minisgl\utils\hf.py 10 4 60% 12-14, 19-20
python\minisgl\utils\logger.py 57 19 67% 40, 46-47, 67-87, 104-110
python\minisgl\utils\misc.py 21 13 38% 6-17, 22-23, 28, 33
python\minisgl\utils\mp.py 89 54 39% 19-22, 25-26, 29-30, 40-43, 46-47, 50-51, 61-64, 67-68, 71, 74, 77, 80-81, 91-94, 97-98, 101-102, 112-115, 118, 121-122, 125-126, 136-140, 143-144, 147, 150-151
python\minisgl\utils\torch_utils.py 11 6 45% 12-19
----------------------------------------------------------------------
TOTAL 1241 709 43%
Coverage HTML written to dir htmlcov
============================= 21 passed in 36.70s =============================```
@DarkSharpness any feedback on this change?
@louiswang524 Thanks. I will look into it tommorrow. Sorry I'm not very familiar with sampling so I need some more time.
BTW, personally I would prefer using some flashinfer's implementation for better performance, but the torch implemenation looks fine to me.