seldon-core
seldon-core copied to clipboard
Custom image cannot be deployed due to werkzeug version
Describe the bug
We built a custom inference image which can be deployed successfully using SeldonDeployment. Unfortunately the image did not pass our security check:
** DISPUTED ** Improper parsing of HTTP requests in Pallets Werkzeug v2.1.0 and below allows attackers to perform HTTP Request Smuggling using a crafted HTTP request with multiple requests included inside the body. NOTE: the vendor's position is that this behavior can only occur in unsupported configurations involving development mode and an HTTP server from outside the Werkzeug project.
In order to pass the security scan, we added "Werkzeug==2.1.1" in our package dependency and rebuilt the image. The image passed the security scan but failed during the k8s deployments. The error shows below:
Traceback (most recent call last): File "/usr/local/bin/seldon-core-microservice", line 5, in
from seldon_core.microservice import main File "/usr/local/lib/python3.8/site-packages/seldon_core/microservice.py", line 17, in from seldon_core import wrapper as seldon_microservice File "/usr/local/lib/python3.8/site-packages/seldon_core/wrapper.py", line 6, in from flask import Flask, Response, request, send_from_directory File "/usr/local/lib/python3.8/site-packages/flask/init.py", line 21, in from .app import Flask File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 32, in from werkzeug.wrappers import BaseResponse ImportError: cannot import name 'BaseResponse' from 'werkzeug.wrappers' (/usr/local/lib/python3.8/site-packages/werkzeug/wrappers/init.py)
This issue becomes a blocker which stops up moving our local deployment to our Dev Env. Thank you!
To reproduce
Here is our requirements.txt:
Werkzeug==2.1.1 joblib numpy==1.22.1 scipy==1.7.3 pandas==1.3.5 scikit-learn==1.0.2 xgboost==1.6.1 seldon-core
And here is our Dockerfile:
FROM python:3.8-slim WORKDIR /app # Install python packages COPY requirements.txt requirements.txt RUN pip install -r requirements.txt # Copy source code COPY . . # Port for GRPC EXPOSE 5000 # Port for REST EXPOSE 9000 # Define environment variables ENV MODEL_NAME XgboostModel ENV SERVICE_TYPE MODEL # Changing folder to default user RUN chown -R 8888 /app CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE
Expected behaviour
Environment
Model Details
- Images of your model: [Output of:
kubectl get seldondeployment -n <yourmodelnamespace> <seldondepname> -o yaml | grep image:where<yourmodelnamespace>] - Logs of your model: [You can get the logs of your model by running
kubectl logs -n <yourmodelnamespace> <seldonpodname> <container>]
Is this related to https://github.com/SeldonIO/seldon-core/issues/4014 and https://github.com/SeldonIO/seldon-core/issues/4017
Yes, but the thing is that the workaround is using 2.0.3, which fails our security scan.
Can you share the CVE for werkzeug? The last security scan ran two days ago and we haven't been able to spot any CVEs https://github.com/SeldonIO/seldon-core/actions/runs/2747805115.
For context you can find our security policy here https://github.com/SeldonIO/seldon-core/blob/master/SECURITY.md - we use snyk to scan all the dependencies
https://nvd.nist.gov/vuln/detail/CVE-2022-29361
Thank you for sharing @jackxue-ecs we're checking the reason for the CVE not appearing, it seems in snyk it's classified as a lower, what are you using for scan? It also seems to be highlighted as disputed so we need to get more context to identify how to proceed
We just reviewed with the full dependency list installed including Werkzeug, we still can't verify with the snyk scans nor against python CVE database, it would be great if you can provide further details as requested
@jackxue-ecs are you able to provide more details? We'd need info including the tools and exact commands you are running to find these as we haven't been able to replicate/validate - please provide the commands / setup to replicate, as otherwise we won't be able to neither address nor remediate as there's no way for us to check.
@axsaucedo hello, we are using the SaaS version of Aqua Scanner that scans these images. There is no specific command , we are using their scannercli tool. Please let me know what other details are required.
@jkakkar7 thank you for the context, is there a way in which we could be able to run these tests? We have a set of security tests but these are not highlighting this CVE (assuming because it's disputed?), we need to be able to test it in order to validate that it indeed solves the issue, otherwise we can't really test on the PR, and this image is the base for various others including the explainer/detector servers and all the prepackaged servers
For further context, I am just now working on OpenShift release and security scan there did not report anything back on 1.14.0 image, so it seems this was also not flagged there.
@axsaucedo Our org won't allow this to go to production if Werkzeug is not updated to the later version despite the CVE is disputed (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-29361). I saw it is pinged to 2.0.3 which is not a good solution either. Can the team figure out an option for us?
It is unlikely we will be able to solve it as patch release to 1.14.x line. This is due to complicated dependencies relation between Flask, Werkzeug and few other sub-dependencies (see https://github.com/SeldonIO/seldon-core/issues/4017 and https://github.com/pallets/flask/issues/4455).
We will be aiming to solve this in master by moving to Flask 2.x. As this is a major dependency change we cannot do it as a patch release. There are also risks involved all across the components that rely on this so though the PR is open the timelines are unknown.
I was just looking more at the upstream CVE and vendor stand.
This cve is invalid, if you're running the dev server in production you have bigger security issues. The dev server is never intended to be run in production. The cve is also misattributed, it is about Python's http.server.
Though, we provide option to run Flask's dev server we only do it in debug mode. For production deployments we have Gunicorn based setup. Also according to David the CVE is attributed.
I understand this is a disputed CVE. However, pinning Werkzeug to a particular version is not an ideal path. Our Security Team has a tough stance and cannot allow this in production if we cannot resolve this old version issue.