DarkSharpness
DarkSharpness
@shuaills `xgrammar` has just released v0.1.14. Maybe we should merge main and give this PR a try now.
@Superskyyy Currently, the content field of a message can't be null. If there's no message, simply set it as an empty string (like this). ```json { "content": "", "role": "assistant",...
Hello @xxll88. Could you please provide the messages in json? It seems that the content field in one of your messages happens to be `None` (= `null` in json), which...
Updated cc @Ubospica @Seven-Streams . The rate limit policy should be refined later to achieve a balance between fairness(FIFO) and shortest-first (greedy-execution). FIFO may cause head-of-line blocking, while shortest-first may...
Hi! I mainly test the code with CUDA 12.9, so I can't confidently claim a minimum supported CUDA toolkit version yet. I'll try to improve compatibility where possible and also...
Thanks! I'm wondering whether we should add a global scoped `ProfileReq` just like SGLang? I'm uncertain about the pros and cons of `request-scoped` profiler against a `global-scoped` one. Personally, I...
Hi. Really glad you found the project useful. At the moment, this minimal implementation doesn't explicitly target MoE architectures. The optimization space in MoE is quite huge (EP, TP, fused...
I think this is good @hnyls2002. Actually I intended to implement `abort` long ago but I completely forgot about that. Thanks @Zaire404 .
@louiswang524 Thanks. I will look into it tommorrow. Sorry I'm not very familiar with sampling so I need some more time. BTW, personally I would prefer using some flashinfer's implementation...
Personally, I don't upload my `gitignore` (actually I would add .gitignore to the first line of my local `.gitignore`, which ignores itself). I agree we may need some public `.gitignore`....