[Feature] support torch compile cache for DeepSeek V3/R1
Checklist
- [ ] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [ ] 2. Please use English, otherwise it will be closed.
Motivation
as titled
The time taken for each startup is currently too long when torch compile is enabled. It needs optimization.
Related resources
No response
I will work on this.
If this function is not implemented yet, how will TORCHINDUCTOR_CACHE_DIR option behavior? Will the files in /tmp/torchinductor_root/ simply been ignored when server starting?
If this function is not implemented yet, how will
TORCHINDUCTOR_CACHE_DIRoption behavior? Will the files in/tmp/torchinductor_root/simply been ignored when server starting?
According to the official pytorch doc at https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html, torch.compile enables caching by default. Just that it saves the cache in /tmp/torchinductor_root which might be cleared in some time. Explicitly setting TORCHINDUCTOR_CACHE_DIR will save your cache in a specified directory, which you can copy to other machines for reuse.
If this function is not implemented yet, how will
TORCHINDUCTOR_CACHE_DIRoption behavior? Will the files in/tmp/torchinductor_root/simply been ignored when server starting?According to the official pytorch doc at https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html,
torch.compileenables caching by default. Just that it saves the cache in/tmp/torchinductor_rootwhich might be cleared in some time. Explicitly settingTORCHINDUCTOR_CACHE_DIRwill save your cache in a specified directory, which you can copy to other machines for reuse.
In that case, what needed to be support is only saving/reading cache to/from arbitrarily path, correct?
If I copy /tmp/torchinductor_root from one machine to another, it still works on both machine?
If this function is not implemented yet, how will
TORCHINDUCTOR_CACHE_DIRoption behavior? Will the files in/tmp/torchinductor_root/simply been ignored when server starting?如果这个功能尚未实现,TORCHINDUCTOR_CACHE_DIR选项会如何表现?当服务器启动时,/tmp/torchinductor_root/中的文件会被简单地忽略吗?According to the official pytorch doc at https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html,
torch.compileenables caching by default. Just that it saves the cache in/tmp/torchinductor_rootwhich might be cleared in some time. Explicitly settingTORCHINDUCTOR_CACHE_DIRwill save your cache in a specified directory, which you can copy to other machines for reuse.根据官方 PyTorch 文档在 https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html,torch.compile默认启用缓存。只是它将缓存保存到/tmp/torchinductor_root,这可能在某些时候被清除。显式设置TORCHINDUCTOR_CACHE_DIR将把您的缓存保存在指定的目录中,您可以将其复制到其他机器以供重用。In that case, what needed to be support is only saving/reading cache to/from arbitrarily path, correct? If I copy
/tmp/torchinductor_rootfrom one machine to another, it still works on both machine?
yes, I will add some content on caching in DeepSeek doc. The cache will still work if the both machines have the same hardware.
It still takes too long(~180s on deepseek-r1, with bs[1,2,4,8,16,24,32,40,48,56,64,65]) during cuda graph capturing while enable torch compile. It seems torch only reuses part of the TORCHINDUCTOR_CACHE_DIR and recompiles the other every time the sglang starts.
Changes below could reduce the time but I have no idea whether we set dynamic=False in this commit: https://github.com/sgl-project/sglang/commit/07ec07ad1fa59e0f07a4fcd1b1f324123c2e2bd4
--- a/python/sglang/srt/model_executor/cuda_graph_runner.py
+++ b/python/sglang/srt/model_executor/cuda_graph_runner.py
@@ -105,7 +105,7 @@ def patch_model(
mode=os.environ.get(
"SGLANG_TORCH_COMPILE_MODE", "max-autotune-no-cudagraphs"
),
- dynamic=False,
+ dynamic=True,
)
else:
yield model.forward