vector
vector copied to clipboard
Make Vector aware of available memory
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Use Cases
We have had discussions floating around this in various issues, but I couldn't find a reference issue for it so creating this one.
Users have had issues in running Vector in memory constrained environments where it'd be better for Vector to apply back-pressure than increase memory usage when it is close to its cap, thus avoiding an OOM kill.
Attempted Solutions
No response
Proposal
Vector is "memory capacity aware" and applies back-pressure rather than allocating when it is at risk of being OOM killed.
References
- https://github.com/vectordotdev/vrl/issues/82
- https://github.com/vectordotdev/vector/issues/11770#issuecomment-1068853439
- https://github.com/vectordotdev/vector/issues/17123
Version
vector 0.20.0 (x86_64-apple-darwin 2a706a3 2022-02-10)
Is there no solution to this problem?
No response?
@baiyibing123 We have the same concerns, as we often hit OOM death issues with Vector. However, I'll say that the best solution for this would be contributions of code to solve the issue.
Agreed, this is likely to be a very large and invasive project that we unfortunately haven't been able to prioritize just yet. I realize it would be very useful.
Perhaps the first thing to do would be aware in components, to make this more of a manage-able process. Is it possible for each component to understand the memory that it is using? Can this be exposed in an internal (prometheus) metric easily, or would that require significant work? I would theorize that each aggregation at least could understand its memory space usage, since each metric has a easily understood size. Same for enrichments, which have fixed sizes once indexed. Buffers in sinks. Lua. Reduce? I am not sure what other types of components would take up significant memory other than the memory required to thread N individual processing pipelines. But I am guessing without knowing the code base at all.
I think the biggest item would be aggregations (cardinality) and buffers in sinks - maybe start there?
Perhaps the first thing to do would be aware in components, to make this more of a manage-able process. Is it possible for each component to understand the memory that it is using? Can this be exposed in an internal (prometheus) metric easily, or would that require significant work? I would theorize that each aggregation at least could understand its memory space usage, since each metric has a easily understood size. Same for enrichments, which have fixed sizes once indexed. Buffers in sinks. Lua. Reduce? I am not sure what other types of components would take up significant memory other than the memory required to thread N individual processing pipelines. But I am guessing without knowing the code base at all.
I think the biggest item would be aggregations (cardinality) and buffers in sinks - maybe start there?
We did actually take one stab at exposing allocations per-component in https://vector.dev/blog/tracking-allocations/. It's still experimental, currently.
I agree each component could make an attempt at managing its own memory but without a framework in place for this in Vector it may be a bit fraught to do it per-component. I think your sense is roughly right though: sinks tend to use memory creating concurrent requests and memory buffers, some transforms like aggregate maintain state, etc.
Thanks for the note on the per-component beta framework. I didn't know that existed, though the "~20% less throughput" comment is concerning. It might be useful for debugging, but in places where we're at the edge of performance (hence the debugging) it may be a little troublesome.