[Bug]: After sending a streaming request to a custom endpoint, the entire LiteLLM proxy gateway crashed. (If this happens in a production environment, it could cause a major impact.)
What happened?
1、The configuration file is shown in the image below. I modified the api_base parameter to send requests to a custom-defined endpoint for debugging and analysis. The implementation of this endpoint is very simple—it just returns a plain string.
Here is the actual implementation code of the endpoint.
2、After deployment, everything worked fine.
3、 After I sent a simple streaming request to the LiteLLM proxy gateway, the entire gateway crashed!!!
curl --location 'http://litellm-test-service.litellm-test:4000/chat/completions' \
--header 'Authorization: Bearer my,.psd' \
--header 'Content-Type: application/json' \
--data '{
"model": "test-gemini-2.5-flash-preview-04-17",
"messages": [
{
"role": "user",
"content": "how are you"
}
],
"stream": true
}'
4、I believe this bug is catastrophic. If an LLM provider returns a similar type of response, it could cause the entire LiteLLM proxy to become unavailable. This would lead to serious incidents in a production environment. I sincerely hope this issue can be addressed and fixed as soon as possible.
5、Here is the detailed deployment file. deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: litellm-test
namespace: litellm-test
labels:
app: litellm-test
spec:
replicas: 1
selector:
matchLabels:
app: litellm-test
template:
metadata:
labels:
app: litellm-test
spec:
containers:
- name: litellm-test
image: ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.67.0-stable
imagePullPolicy: Always
env:
- name: LITELLM_MASTER_KEY
value: "my,.psd"
- name: LITELLM_SALT_KEY
value: "my,.psd"
- name: MY_GCP_KEY
value: "my,.gcp,.psd"
args:
- "--config"
- "/app/config.yaml"
volumeMounts:
- name: config-volume
mountPath: /app/config.yaml
subPath: config.yaml
readOnly: true
livenessProbe:
httpGet:
path: /health/liveliness
port: 4000
initialDelaySeconds: 120
periodSeconds: 15
successThreshold: 1
failureThreshold: 10
timeoutSeconds: 10
startupProbe:
httpGet:
path: /health/readiness
port: 4000
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
failureThreshold: 25
timeoutSeconds: 10
readinessProbe:
httpGet:
path: /health/readiness
port: 4000
initialDelaySeconds: 5
periodSeconds: 15
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 10
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: "0.1"
memory: 100Mi
volumes:
- name: config-volume
configMap:
name: litellm-test-config
---
apiVersion: v1
kind: Service
metadata:
name: litellm-test-service
namespace: litellm-test
labels:
app: litellm-test
spec:
selector:
app: litellm-test
ports:
- protocol: TCP
port: 4000
targetPort: 4000
---
apiVersion: v1
kind: ConfigMap
metadata:
name: litellm-test-config
namespace: litellm-test
data:
config.yaml: |
model_list:
- model_name: test-gemini-2.5-flash-preview-04-17
litellm_params:
model: gemini/gemini-2.5-flash-preview-04-17
api_key: "os.environ/MY_GCP_KEY"
api_base: "http://request-test-service.zrob:9093/test" #just an svc endpoint in k8s for me to debug
general_settings:
proxy_batch_write_at: 60 # Batch write spend updates every 60s
database_connection_pool_limit: 20
disable_spend_logs: True
disable_error_logs: True
allow_requests_on_db_unavailable: True
my_endpoint.py
from flask import Flask, request
from gevent import pywsgi
import os
FLASK_APP = Flask(__name__)
FLASK_APP.config['SECRET_KEY'] = os.urandom(24)
@FLASK_APP.route('/', defaults={'path': ''}, methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH', 'OPTIONS'])
@FLASK_APP.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH', 'OPTIONS'])
def test(path):
print(f"method:{request.method}")
print(f'path:{path}')
print('headers')
for header in request.headers:
print(f'{header[0]}: {header[1]}')
print('bodys')
print(request.get_data().decode('utf-8'))
return "succeed"
if __name__ == '__main__':
server = pywsgi.WSGIServer(('0.0.0.0', 9093), FLASK_APP)
server.serve_forever()
Relevant log output
INFO: 172.31.30.183:59334 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO: 172.31.30.183:59348 - "GET /health/readiness HTTP/1.1" 200 OK
INFO: 172.31.30.183:54824 - "GET /health/readiness HTTP/1.1" 200 OK
INFO: 172.31.30.183:54822 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO: 172.31.30.183:47718 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO: 172.31.30.183:47734 - "GET /health/readiness HTTP/1.1" 200 OK
INFO: 172.31.30.183:36260 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO: 172.31.30.183:36268 - "GET /health/readiness HTTP/1.1" 200 OK
INFO: 172.31.30.183:40100 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO: 172.31.30.183:40116 - "GET /health/readiness HTTP/1.1" 200 OK
INFO: 172.31.30.183:37964 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO: 172.31.30.183:37968 - "GET /health/readiness HTTP/1.1" 200 OK
INFO: 172.31.30.183:50638 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO: 172.31.30.183:50652 - "GET /health/readiness HTTP/1.1" 200 OK
INFO: 172.31.30.183:56238 - "GET /health/readiness HTTP/1.1" 200 OK
INFO: 172.31.30.183:56232 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO: 172.31.30.183:54036 - "GET /health/readiness HTTP/1.1" 200 OK
INFO: 172.31.30.183:54038 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO: 172.31.30.183:42964 - "GET /health/readiness HTTP/1.1" 200 OK
INFO: 172.31.30.183:42966 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO: 172.31.30.183:45448 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO: 172.31.30.183:45446 - "GET /health/readiness HTTP/1.1" 200 OK
INFO: 172.31.30.183:48096 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO: 172.31.30.183:48100 - "GET /health/readiness HTTP/1.1" 200 OK
INFO: 172.31.10.51:44246 - "POST /chat/completions HTTP/1.1" 200 OK
Are you a ML Ops Team?
No
What LiteLLM version are you on ?
v1.67.0-stable
Twitter / LinkedIn details
No response