vector icon indicating copy to clipboard operation
vector copied to clipboard

Make Vector aware of available memory

Open jszwedko opened this issue 3 years ago • 7 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

We have had discussions floating around this in various issues, but I couldn't find a reference issue for it so creating this one.

Users have had issues in running Vector in memory constrained environments where it'd be better for Vector to apply back-pressure than increase memory usage when it is close to its cap, thus avoiding an OOM kill.

Attempted Solutions

No response

Proposal

Vector is "memory capacity aware" and applies back-pressure rather than allocating when it is at risk of being OOM killed.

References

  • https://github.com/vectordotdev/vrl/issues/82
  • https://github.com/vectordotdev/vector/issues/11770#issuecomment-1068853439
  • https://github.com/vectordotdev/vector/issues/17123

Version

vector 0.20.0 (x86_64-apple-darwin 2a706a3 2022-02-10)

jszwedko avatar Mar 22 '22 20:03 jszwedko

Is there no solution to this problem?

baiyibing123 avatar Apr 03 '24 02:04 baiyibing123

No response?

baiyibing123 avatar Apr 09 '24 08:04 baiyibing123

@baiyibing123 We have the same concerns, as we often hit OOM death issues with Vector. However, I'll say that the best solution for this would be contributions of code to solve the issue.

johnhtodd avatar Apr 09 '24 18:04 johnhtodd

Agreed, this is likely to be a very large and invasive project that we unfortunately haven't been able to prioritize just yet. I realize it would be very useful.

jszwedko avatar Apr 09 '24 20:04 jszwedko

Perhaps the first thing to do would be aware in components, to make this more of a manage-able process. Is it possible for each component to understand the memory that it is using? Can this be exposed in an internal (prometheus) metric easily, or would that require significant work? I would theorize that each aggregation at least could understand its memory space usage, since each metric has a easily understood size. Same for enrichments, which have fixed sizes once indexed. Buffers in sinks. Lua. Reduce? I am not sure what other types of components would take up significant memory other than the memory required to thread N individual processing pipelines. But I am guessing without knowing the code base at all.

I think the biggest item would be aggregations (cardinality) and buffers in sinks - maybe start there?

johnhtodd avatar Apr 09 '24 20:04 johnhtodd

Perhaps the first thing to do would be aware in components, to make this more of a manage-able process. Is it possible for each component to understand the memory that it is using? Can this be exposed in an internal (prometheus) metric easily, or would that require significant work? I would theorize that each aggregation at least could understand its memory space usage, since each metric has a easily understood size. Same for enrichments, which have fixed sizes once indexed. Buffers in sinks. Lua. Reduce? I am not sure what other types of components would take up significant memory other than the memory required to thread N individual processing pipelines. But I am guessing without knowing the code base at all.

I think the biggest item would be aggregations (cardinality) and buffers in sinks - maybe start there?

We did actually take one stab at exposing allocations per-component in https://vector.dev/blog/tracking-allocations/. It's still experimental, currently.

I agree each component could make an attempt at managing its own memory but without a framework in place for this in Vector it may be a bit fraught to do it per-component. I think your sense is roughly right though: sinks tend to use memory creating concurrent requests and memory buffers, some transforms like aggregate maintain state, etc.

jszwedko avatar Apr 09 '24 21:04 jszwedko

Thanks for the note on the per-component beta framework. I didn't know that existed, though the "~20% less throughput" comment is concerning. It might be useful for debugging, but in places where we're at the edge of performance (hence the debugging) it may be a little troublesome.

johnhtodd avatar Apr 09 '24 22:04 johnhtodd