vector
vector copied to clipboard
Add ability to include arbitrary graph attributes in graphviz DOT format
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Use Cases
We use the graph output of Vector to describe how data flows through the system. However, this currently only shows the most abstract concepts - flows between sources/remaps/sinks. There is no description of what each component does, so the graph output is useful, but not nearly as useful as it could be. I am hoping to be able to entirely document our Vector processes using graphviz DOT format, so that a full map of the process would be self-explanatory. I would like to have notes, colors, URLs (which could also link to our Grafana platform, using in-line variables!) shapes, fonts, etc. - everything that is available in DOT.
Attempted Solutions
Hand-editing the graphviz files can be done, but that isn't much more easy than just doing the whole thing from scratch by hand. Having flows change and being in-line commented would be very helpful, especially where multiple authors are working on the same data sets and need some holistic way of understanding the entire processing pipeline in a single view. This allows automatic delegation of documentation down to the component level.
Proposal
Each defined stanza (source, transform, sink) would have the ability to have a block of DOT definition associated with it. This could just be one block of text, or could be structured in a more easily-parsed way. This doesn't need much parsing by Vector - it just needs to be sent to the "vector graph" output for each object. The only thing I can imagine intersecting with current model is if the object shape is changed, which would then override Vector's choice of "invtrapezium" or "diamond" or the other very small number of object shapes that are used currently.
Adding a new variable of "stanza_name" or something similar would be convenient, so the name of that particular source/transform/sink could be referenced internally by a variable name by the DOT syntax easily without a bunch of copy/paste of the name. This is especially handy since source/transform/sink names change with some frequency in our configurations.
At a minimum, just attributes per node would be sufficient to start. https://graphviz.org/docs/nodes/
References
No response
Version
vector 0.39.0 (x86_64-unknown-linux-gnu)