[Bug]: After sending a streaming request to a custom endpoint, the entire LiteLLM proxy gateway crashed. (If this happens in a production environment, it could cause a major impact.)

Open zzhaaa123 opened this issue 8 months ago • 0 comments

What happened?

1、The configuration file is shown in the image below. I modified the api_base parameter to send requests to a custom-defined endpoint for debugging and analysis. The implementation of this endpoint is very simple—it just returns a plain string. Here is the actual implementation code of the endpoint.

2、After deployment, everything worked fine.

3、 After I sent a simple streaming request to the LiteLLM proxy gateway, the entire gateway crashed!!!

curl --location 'http://litellm-test-service.litellm-test:4000/chat/completions' \
--header 'Authorization: Bearer my,.psd' \
--header 'Content-Type: application/json' \
--data '{
    "model": "test-gemini-2.5-flash-preview-04-17",
    "messages": [
        {
            "role": "user",
            "content": "how are you"
        }
    ],
    "stream": true
}'

4、I believe this bug is catastrophic. If an LLM provider returns a similar type of response, it could cause the entire LiteLLM proxy to become unavailable. This would lead to serious incidents in a production environment. I sincerely hope this issue can be addressed and fixed as soon as possible.

5、Here is the detailed deployment file. deployment.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: litellm-test
  namespace: litellm-test
  labels:
    app: litellm-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: litellm-test
  template:
    metadata:
      labels:
        app: litellm-test
    spec:
      containers:
        - name: litellm-test
          image: ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.67.0-stable
          imagePullPolicy: Always
          env:
            - name: LITELLM_MASTER_KEY
              value: "my,.psd"
            - name: LITELLM_SALT_KEY
              value: "my,.psd"
            - name: MY_GCP_KEY
              value: "my,.gcp,.psd"
          args:
            - "--config"
            - "/app/config.yaml"  
          volumeMounts:                 
            - name: config-volume
              mountPath: /app/config.yaml
              subPath: config.yaml
              readOnly: true
          livenessProbe:
            httpGet:
              path: /health/liveliness
              port: 4000
            initialDelaySeconds: 120
            periodSeconds: 15
            successThreshold: 1
            failureThreshold: 10
            timeoutSeconds: 10
          startupProbe:
            httpGet:
              path: /health/readiness
              port: 4000
            initialDelaySeconds: 20
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 25
            timeoutSeconds: 10
          readinessProbe:
            httpGet:
              path: /health/readiness
              port: 4000
            initialDelaySeconds: 5
            periodSeconds: 15
            successThreshold: 1
            failureThreshold: 3
            timeoutSeconds: 10
          resources:
            limits:
              cpu: "1"
              memory: 1Gi
            requests:
              cpu: "0.1"
              memory: 100Mi
      volumes:
        - name: config-volume
          configMap:
            name: litellm-test-config

---
apiVersion: v1
kind: Service
metadata:
  name: litellm-test-service
  namespace: litellm-test
  labels:
    app: litellm-test
spec:
  selector:
    app: litellm-test
  ports:
    - protocol: TCP
      port: 4000
      targetPort: 4000

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: litellm-test-config
  namespace: litellm-test
data:
  config.yaml: |
      model_list:
        - model_name: test-gemini-2.5-flash-preview-04-17
          litellm_params:
            model: gemini/gemini-2.5-flash-preview-04-17
            api_key: "os.environ/MY_GCP_KEY"
            api_base: "http://request-test-service.zrob:9093/test" #just an svc endpoint in k8s for me to debug
    
      general_settings:
        proxy_batch_write_at: 60 # Batch write spend updates every 60s
        database_connection_pool_limit: 20
        disable_spend_logs: True 
        disable_error_logs: True
        allow_requests_on_db_unavailable: True

my_endpoint.py

from flask import Flask, request
from gevent import pywsgi
import os
FLASK_APP = Flask(__name__)
FLASK_APP.config['SECRET_KEY'] = os.urandom(24)

@FLASK_APP.route('/', defaults={'path': ''}, methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH', 'OPTIONS'])
@FLASK_APP.route('/<path:path>', methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH', 'OPTIONS'])
def test(path):
    print(f"method：{request.method}")

    print(f'path：{path}')

    print('headers')
    for header in request.headers:
        print(f'{header[0]}: {header[1]}')

    print('bodys')
    print(request.get_data().decode('utf-8'))

    return "succeed"

if __name__ == '__main__':
    server = pywsgi.WSGIServer(('0.0.0.0', 9093), FLASK_APP)
    server.serve_forever()

Relevant log output

INFO:     172.31.30.183:59334 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO:     172.31.30.183:59348 - "GET /health/readiness HTTP/1.1" 200 OK
INFO:     172.31.30.183:54824 - "GET /health/readiness HTTP/1.1" 200 OK
INFO:     172.31.30.183:54822 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO:     172.31.30.183:47718 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO:     172.31.30.183:47734 - "GET /health/readiness HTTP/1.1" 200 OK
INFO:     172.31.30.183:36260 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO:     172.31.30.183:36268 - "GET /health/readiness HTTP/1.1" 200 OK
INFO:     172.31.30.183:40100 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO:     172.31.30.183:40116 - "GET /health/readiness HTTP/1.1" 200 OK
INFO:     172.31.30.183:37964 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO:     172.31.30.183:37968 - "GET /health/readiness HTTP/1.1" 200 OK
INFO:     172.31.30.183:50638 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO:     172.31.30.183:50652 - "GET /health/readiness HTTP/1.1" 200 OK
INFO:     172.31.30.183:56238 - "GET /health/readiness HTTP/1.1" 200 OK
INFO:     172.31.30.183:56232 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO:     172.31.30.183:54036 - "GET /health/readiness HTTP/1.1" 200 OK
INFO:     172.31.30.183:54038 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO:     172.31.30.183:42964 - "GET /health/readiness HTTP/1.1" 200 OK
INFO:     172.31.30.183:42966 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO:     172.31.30.183:45448 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO:     172.31.30.183:45446 - "GET /health/readiness HTTP/1.1" 200 OK
INFO:     172.31.30.183:48096 - "GET /health/liveliness HTTP/1.1" 200 OK
INFO:     172.31.30.183:48100 - "GET /health/readiness HTTP/1.1" 200 OK
INFO:     172.31.10.51:44246 - "POST /chat/completions HTTP/1.1" 200 OK

Are you a ML Ops Team?

What LiteLLM version are you on ?

v1.67.0-stable

Twitter / LinkedIn details

No response

Apr 25 '25 03:04 zzhaaa123