[Bug] [SECURITY] Critical security incidents of SGLang
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.
Describe the bug
Hey, I have reported a vulnerability to the maintainers. I have asked for contact details 2+ months ago, in the Issue, and followed your official instructions. Afterwards, the communication was poorly done over email. I have shared the report and got confirmation that you are looking into it, but never got any response since then.
Now that SG-Lang is part of the PyTorch family, who is in charge of this project's security and handling such reports?
@avilum May you submit a PR to fix it? Thanks!
@zhyncs @zhaochenyang20 @Ying1123 @yings-db I was under the impression you have been fixing it for the past 2 months. You answered my email clarifying you are looking into it. I am very surprised now.
It is uncommon for researchers to write the security patches for vulnerabilities they report themselves. This vulnerability is on X.AI and YOU to solve, not me. It is your responsibility is to solve it, not the users'. You currently place your users at risk instead of responsibly fixing it like every organization or project ever does (do you know what a CVE means? Did you work on products prior to SGLang?
I would love to help with the fix and guidelines, but that's everyone's responsibility, not mine. You don't seem to understand the impact of the vulnerability or understand how vulnerabilities responsible disclosure works. https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html
Who is in charge of this project's security? Have you ever fixed a security vulnerability, or know what is the meaning of vulnerability?
SGlang users are currently in risk of remote code execution (so is X.AI)
I forward it to @adarshxs
Several critical vulnerabilities and around a dozen high-severity vulnerabilities were detected by the Aqua scanner in our production environment. At the request of our security team, we continuously remediate vulnerabilities on our own. However, the delay in updating images in production can be up to 2 months.
Some of the vulnerabilities are related to the ssh server included in the image and the version of Pillow. Here’s how to fix them:
FROM lmsysorg/sglang:v0.4.5.post3-cu125
RUN apt-get update &&
apt-get install -y libjpeg-dev zlib1g-dev &&
rm -rf /var/lib/apt/lists/*
RUN pip install sglang-router
RUN sed -i -e 's/Pillow==8.3.1/Pillow==11.2.1/g' /opt/hpcx/clusterkit/bin/output/requirements.txt
RUN rm /etc/ssh/ssh_host_ecdsa_key
/etc/ssh/ssh_host_ed25519_key
/etc/ssh/ssh_host_rsa_key
/etc/ssh/ssh_host_ecdsa_key.pub
/etc/ssh/ssh_host_ed25519_key.pub
/etc/ssh/ssh_host_rsa_key.pub
Also, I’d like to remind you that using Docker images in a production environment with critical and high-severity vulnerabilities is STRICTLY PROHIBITED! Because of this, we constantly have to fix sglang images ourselves, followed by a lengthy approval process with the information security department, which can take 1-2 months. (And by that time, the sglang version becomes outdated, yes.)
@adarshxs Adarsh is on this. Thanks!
will be submitting relevant fixes. Thank you
@zhyncs @adarshxs I've conducted numerous security scans on the Docker images of sglang and found that many vulnerabilities in the base image originate from the child Docker image nvcr.io/nvidia/tritonserver:24.04-py3-min, which is used during the build (it has 69 vulnerabilities with exploitable exploits). This is the relevant line of code: https://github.com/sgl-project/sglang/blob/b5be56944b6eb61b44866011f157e8df0e563bd7/docker/Dockerfile#L3 Thus, it's severely outdated.
The newer Docker image nvcr.io/nvidia/tritonserver:25.03-py3-min has only 17 vulnerabilities, 9 of which can be fixed by upgrading Pillow to version 11.2.1. Therefore, building the sglang Docker image on this base would significantly improve security by changing just one line of code! Could you advise if there are any blockers for this?
@Swipe4057 Thank you for your evaluations. Please feel free to open PRs to support these new images. We will review the same. The current vulnerability @avilum mentions is related to another aspect in the codebase that we hope to fix soon!
@adarshxs @zhaochenyang20 I’ve made a PR, please take a look. It fixes a large number of vulnerabilities in the image, but I don’t have permissions to run CI. Here’s the link: https://github.com/sgl-project/sglang/pull/5744
I am runing it. thanks! @Swipe4057
Several critical vulnerabilities and around a dozen high-severity vulnerabilities were detected by the Aqua scanner in our production environment. At the request of our security team, we continuously remediate vulnerabilities on our own. However, the delay in updating images in production can be up to 2 months.
Some of the vulnerabilities are related to the ssh server included in the image and the version of Pillow. Here’s how to fix them:
FROM lmsysorg/sglang:v0.4.5.post3-cu125
RUN apt-get update && apt-get install -y libjpeg-dev zlib1g-dev && rm -rf /var/lib/apt/lists/*
RUN pip install sglang-router
RUN sed -i -e 's/Pillow==8.3.1/Pillow==11.2.1/g' /opt/hpcx/clusterkit/bin/output/requirements.txt
RUN rm /etc/ssh/ssh_host_ecdsa_key /etc/ssh/ssh_host_ed25519_key /etc/ssh/ssh_host_rsa_key /etc/ssh/ssh_host_ecdsa_key.pub /etc/ssh/ssh_host_ed25519_key.pub /etc/ssh/ssh_host_rsa_key.pub
![]()
![]()
![]()
I'm not sure if the issue you raised is the same as the one mentioned in the original post. If you're uncertain whether you're referring to the same matter, I don't think it should be discussed under this topic.
Hey all, To avoid public disclosure of the vuln I have opened a GHSA ticket: https://github.com/sgl-project/sglang/security/advisories/GHSA-w9wq-8grq-mp55
Please move the discussion there. A fix started at https://github.com/sgl-project/sglang/pull/5752 but has not addressed my report yet.
Following the merge of PR #5752 I opened another github security advisory:
- Unanswered - https://github.com/sgl-project/sglang/security/advisories/GHSA-w9wq-8grq-mp55 (Pickle issue)
- Unanswered - https://github.com/sgl-project/sglang/security/advisories/GHSA-hprm-w67m-xh4w (ZMQ bind address was too permissive)
We should issue a CVE for each code change through these advisories. No one have commented on the advisories yet and I await your response. Reminding you that responsible disclosure ends tomorrow (May 7th) as I reported it 90 days ago, and tried to emphasize the risks to the maintainers and x.ai by all means possible.
@adarshxs @zhaochenyang20 @zhyncs
@adarshxs @junliu-mde @zhaochenyang20 Hey, is there an official response or update on this vulnerability?
You already started fixing some of the issues and merged fixes following my disclosure in https://github.com/sgl-project/sglang/pull/5752
vLLM, TensorRT-LLM, and Meta-Llama had fixed them promptly. When should we expect the pickle vulnerability to be fixed in SGLang?
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.
https://www.oligo.security/blog/shadowmq-how-code-reuse-spread-critical-vulnerabilities-across-the-ai-ecosystem @avilum Thanks for the report.
@sundar24295s @adarshxs Any ideas about how to enhance it?
Does not impact isolated clusters yet. But we will have a fix soon using HMAC authentication or a safer deserialization method.
@adarshxs Great, looking forward to the fix
