fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

OpenTelemetry Output Retries on Bad Requests

Open Jared-Miller opened this issue 5 months ago • 0 comments

Bug Report

Describe the bug If the telemetry server responds with 400 Bad Request, the return code is set to FLB_RETRY, causing the request to be retried. The OTLP spec indicates that a client must not retry bad requests.

The client MUST NOT retry the request when it receives HTTP 400 Bad Request response.

Both opentelemetry_legacy_post and opentelemetry_post have the same functionality. The spec also indicates that the only retryable error codes are 429, 502, 503, and 504 , so the following would roughly be how it should be handled, if I'm understanding the spec and Fluent Bit codebase correctly.

        if (c->resp.status < 200 || c->resp.status > 205) {
            if (ctx->log_response_payload &&
                c->resp.payload != NULL &&
                c->resp.payload_size > 0) {
                flb_plg_error(ctx->ins, "%s:%i, HTTP status=%i\n%.*s",
                              ctx->host, ctx->port,
                              c->resp.status,
                              (int) c->resp.payload_size,
                              c->resp.payload);
            }
            else {
                flb_plg_error(ctx->ins, "%s:%i, HTTP status=%i",
                              ctx->host, ctx->port, c->resp.status);
            }

            /* Retryable status codes according to OTLP spec */
            if (c->resp.status == 429 || c->resp.status == 502 || c->resp.status == 503 
                || c->resp.status == 504) {
                    out_ret = FLB_RETRY;
            } 
            else if (c->resp.status == 400) {
                /* OTLP spec says 400 must not be retried */
                out_ret = FLB_ERROR;
            }
            else {
                /* OTLP spec says to treat status codes not documented "according to HTTP specifications" */
                out_ret = FLB_ERROR;
            }
        }

To Reproduce

  • Run Fluent Bit with an OpenTelemetry output.
  • Wait for server to respond with a 400, 429, 502, 503, or 504.
  • See Fluent Bit retry the send until restarted.

Expected behavior FLB_ERROR is returned when an HTTP 400 Bad Request is received.

Your Environment

  • Version used: 4.0.1
  • Filters and plugins: out_opentelemetry

Additional context Fluent Bit will attempt to retry the failed record until the process is restarted.

Jared-Miller avatar Jun 16 '25 16:06 Jared-Miller