fluent-bit
fluent-bit copied to clipboard
OpenTelemetry Output Retries on Bad Requests
Bug Report
Describe the bug
If the telemetry server responds with 400 Bad Request, the return code is set to FLB_RETRY, causing the request to be retried. The OTLP spec indicates that a client must not retry bad requests.
The client MUST NOT retry the request when it receives
HTTP 400 Bad Requestresponse.
Both opentelemetry_legacy_post and opentelemetry_post have the same functionality. The spec also indicates that the only retryable error codes are 429, 502, 503, and 504 , so the following would roughly be how it should be handled, if I'm understanding the spec and Fluent Bit codebase correctly.
if (c->resp.status < 200 || c->resp.status > 205) {
if (ctx->log_response_payload &&
c->resp.payload != NULL &&
c->resp.payload_size > 0) {
flb_plg_error(ctx->ins, "%s:%i, HTTP status=%i\n%.*s",
ctx->host, ctx->port,
c->resp.status,
(int) c->resp.payload_size,
c->resp.payload);
}
else {
flb_plg_error(ctx->ins, "%s:%i, HTTP status=%i",
ctx->host, ctx->port, c->resp.status);
}
/* Retryable status codes according to OTLP spec */
if (c->resp.status == 429 || c->resp.status == 502 || c->resp.status == 503
|| c->resp.status == 504) {
out_ret = FLB_RETRY;
}
else if (c->resp.status == 400) {
/* OTLP spec says 400 must not be retried */
out_ret = FLB_ERROR;
}
else {
/* OTLP spec says to treat status codes not documented "according to HTTP specifications" */
out_ret = FLB_ERROR;
}
}
To Reproduce
- Run Fluent Bit with an OpenTelemetry output.
- Wait for server to respond with a 400, 429, 502, 503, or 504.
- See Fluent Bit retry the send until restarted.
Expected behavior
FLB_ERROR is returned when an HTTP 400 Bad Request is received.
Your Environment
- Version used: 4.0.1
- Filters and plugins: out_opentelemetry
Additional context Fluent Bit will attempt to retry the failed record until the process is restarted.