VictoriaMetrics
VictoriaMetrics copied to clipboard
ingestion: make stream processing for influx protocol optional
Is your feature request related to a problem? Please describe
VictoriaMetrics processes data ingested in influxdb protocol in a streaming fashion. This allows to reduce memory usage and improve ingestion performance at the same time. However, there are cases when streaming could become harmful. For example, clients like telegraf could start retrying ingesting batches of data if ingestion request failed in the middle. This would mean that VM already processed and persisted 50% of the batch, and now it would need to do this again on the next retry attempt from telegraf. In some cases, this situation can develop in a snowballing of ingest requests from telegraf which just can't be processed in full.
Describe the solution you'd like
It would be great to have an option to disable streaming mode for HTTP (not UDP) influx protocol. This is especially important for VM installations that run with strict limit on resources.
@AndrewChubatiuk , the stream parsing mode must be disabled by default for HTTP-based data ingestion for Influx line protocol, since this is the major source of issues for Telegraf users. It should support stream_mode=1 http query arg, which could enable stream parsing mode.
When stream parsing mode is disabled, then the whole request body must be read in memory before starting to parse it. The success response must be sent to the client (e.g. Telegraf) immediately after reading the request body. After that the request body must be parsed and saved in the storage. Partial request body must be rejected (for example, if Telegraf closes the connection in the middle of sending of request body because of timeout).
Stream parsing mode should remain enabled for TCP-based data ingestion, since it is common practice to send unlimited amounts of lines over a single TCP connection, so they should be parsed and pushed to the storage as fast as possible.
When stream parsing mode is disabled, then the whole request body must be read in memory before starting to parse it. The success response must be sent to the client (e.g. Telegraf) immediately after reading the request body.
@valyala This would mean that if there are parsing errors - the client won't ever know about this. If we're doing it like batch-processing, should we return ok only after everything was parsed and scheduled for ingestion? This would have performance impact, of course. But this is expected behavior for per-batch processing, and something many users asked before in tickets like this https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4446.
This was discussed internally and we agreed that with batch processing VM should respond to user only after parsing is finished. The whole batch should be rejected and client should get error in response if at least one row in the batch failed to parse. In this case, VM should also respond with non-retryable status code 422 (http.StatusUnprocessableEntity) to the client.
Unfortunately, telegraf doesn't have default settings for non-retryable requests, it needs to be configured manually - see non_retryable_statuscodes in https://github.com/influxdata/telegraf/blob/master/plugins/outputs/http/README.md#configuration. We should mention this in docs.
The error message returned to the client should also mention possibles ways to mitigate this, like enable stream mode to ignore bad lines. cc @AndrewChubatiuk
@hagen1778 should response contains information about all failed rows or just one is enough?
@hagen1778 should response contains information about all failed rows or just one is enough?
I think, the first encountered error is enough.
thought telegraf users relying on influxdb output, not http. This output doesn't retry any 4xx error
Pull request https://github.com/VictoriaMetrics/VictoriaMetrics/pull/7165 has been merged and will be available in the next release. Starting from new release, ingestion via influx protocol will be processed in batch mode by default. To enable stream mode Stream-Mode: "1" HTTP header needs to be passed with each request. See more details at https://docs.victoriametrics.com/#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf
the stream parsing mode must be disabled by default for HTTP-based data ingestion for Influx line protocol, since this is the major source of issues for Telegraf users. It should support stream_mode=1 http query arg, which could enable stream parsing mode.
@valyala we switched to using HTTP header instead of query param because telegraf's output outputs.influxdb doesn't allow setting custom GET params (or overrides them if they set in URL). But it supports HTTP headers.
Re-opening this ticket until it is released.
The feature was included into v1.105.0