Add vector package
Description
🚀 What is Vector and why we're adding it
Vector is an observability data pipeline built in Rust. It provides a modular architecture to collect, transform, and route logs and metrics efficiently. We're adopting Vector to:
- Preprocess log streams (e.g., filter out noisy logs, deduplicate repeated messages)
- Throttle excessive log traffic
- Enable dynamic and runtime-adjustable pipelines
- Monitor pipeline performance via Prometheus metrics
🔌 Integration with newlogd
Vector is integrated as a socket-based middleware layer between newlogd and its sinks. Logs are passed through Vector before being written to disk or uploaded.
This setup allows us to keep newlogd as the primary logging agent while gradually introducing Vector’s advanced capabilities without disrupting existing workflows.
Check out the LOGGING.md doc to learn more.
⚙️ Dynamic configuration support
To allow runtime updates of Vector’s behavior, we support user-supplied configuration via a base64-encoded payload. The flow is as follows:
- A new Vector config is uploaded (base64-encoded) via global options.
newlogddecodes the config and writes it to/persist/vector/config/vector.yaml.new.- Vector watches for changes using
inotifywait:- If the new config is valid, it is promoted atomically to the live config.
- If invalid, it is discarded, and the existing config remains active.
This ensures safe, crash-free config updates without requiring a container restart.
📦 Other
- I looked at vector's memory usage during my tests and it was never higher than 3MB
- I had to increase ROOTFS_MAXSIZE_MB to 280MB (+10MB) to fit vector in. We can decrease it however in the future if we remove some components / functions that are replaced by vector
- Right now we build our fork of the Vector project separately with only the necessary features https://github.com/rucoder/eve-mini-vector/
🧪 Next steps
- Create an repo on Docker Hub for the new package
- Integrate the package build process into it's own Dockerfile instead of using a separate repo
- Remove the log filtering and deduplication mechanisms. They are marked as deprecated for now.
PR dependencies
Depends on https://github.com/lf-edge/eve/pull/5009
How to test and validate this PR
The old tests should be used, since the overall functionality remains the same.
I will also provide more Eden tests to test the new vector.config parameter.
Changelog notes
Added Vector as a tool to transform our logs and metrics.
PR Backports
None
Checklist
- [x] I've provided a proper description
- [x] I've added the proper documentation
- [x] I've tested my PR on amd64 device
- [ ] I've tested my PR on arm64 device
- [x] I've written the test verification instructions
- [x] I've set the proper labels to this PR
I suggest adding a Apparmor profile for vector.
I apologize for coming to this 2 days late; I had kept it right in front of me and still got delayed.
I didn't quite get where vector fits into the pipeline, what part it either is replacing, or coming in between two (or more) existing parts. There is the doc and especially the diagram, but which of those parts is performed by which component?
The old tests should be used, since the overall functionality remains the same.
I would also add test scenarios for at least verify:
- turn the vector filtering on/off
- change config
- some new transformation
I apologize for coming to this 2 days late; I had kept it right in front of me and still got delayed.
I didn't quite get where vector fits into the pipeline, what part it either is replacing, or coming in between two (or more) existing parts. There is the doc and especially the diagram, but which of those parts is performed by which component?
the top part of the diagram is vector and the bottom one is newlogd. I'll make the titles a little bigger :)
I apologize for coming to this 2 days late; I had kept it right in front of me and still got delayed.
I didn't quite get where vector fits into the pipeline, what part it either is replacing, or coming in between two (or more) existing parts. There is the doc and especially the diagram, but which of those parts is performed by which component?
Nice diagram here: https://github.com/lf-edge/eve/pull/5008/commits/e82f0e541fd230468106676d1c14e515abfff69b
A perfect commit where the vector is connected is here: https://github.com/lf-edge/eve/pull/5008/commits/debe6d7ecb76d8ee58d656749a7e51abc502ccc7
If the hashes are changed, just look through the list of commits and find those ones:
- docs: add Vector logging documentation
- connect vector to newlog through sockets
There is that really good "EVE Logging Flows" diagram about ⅓ of the way through LOGGING.md. Where does vector fit within that diagram? Is it a subcomponent of one of them or a new one? If new, can we modify it to add so it is clear?
There is that really good "EVE Logging Flows" diagram about ⅓ of the way through
LOGGING.md. Where does vector fit within that diagram? Is it a subcomponent of one of them or a new one? If new, can we modify it to add so it is clear?
Oh... Good point. LOGGONG.md should be updated within this PR, definitely. The diagram, I mean.
@deitch @OhmSpectator there are no sources for the diagram that you mention, that's why I created a new diagram in the Vector section of the LOGGING.md document
no sources
Ah, good point. The original commit was @naiming-zededa ; Naiming, do you have the source to the newlog diagram?
I also tried asking AI to convert it to mermaid. Here is the best I got so far. If I can improve it, I will (look at the source of the comment to see the original mermaid text):
flowchart LR
A[containerd Processes] -->|logs| B
C["Pillar & other services"] -->|logs| B
B -->|logs| C
B -->|log query| D
E["Kernel messages (/dev/kmsg)"] -->|logs| D
F["Syslog messages (/dev/log)"] -->|logs| D
D -->|formatted logs| G[Temp log files\\in /persist/newlog/collect]
D -->|gzipped logs| H[gzip log files\\in /persist/newlog/\\keepSentQueue\\devUpload\\appUpload]
H -->|gzipped logs| I["loguploader service (https)"]
I -->|API| J[Cloud Logging Services]
%% Group memlogd and newlogd
subgraph "Core Logging"
B[memlogd Ringbuffer]
D[newlogd container]
end
%% Group pillar components
subgraph "Pillar Container"
C
I
end
%% Group volume storage
subgraph "Pillar Volumes"
G
H
end
%% Force layout: stack memlogd above newlogd
B -.-> D
Try this mermaid one:
graph TD
%% LEFT COLUMN (Sources)
subgraph SR[Sources]
A[containerd Processes]
E["Kernel messages (/dev/kmsg)"]
F["Syslog messages (/dev/log)"]
C["Pillar & other services"]
end
B[memlogd Ringbuffer]
%% MIDDLE COLUMN (Core Logging)
subgraph CL[Core Logging]
direction TB
D[newlogd container]
end
%% RIGHT COLUMN (Pillar Container and Volumes)
I["loguploader service (https)"]
subgraph PV[Pillar Volumes]
G[Temp log files\\in /persist/newlog/collect]
H[gzip log files\\in /persist/newlog/\\keepSentQueue\\devUpload\\appUpload]
end
%% FLOW CONNECTIONS
A -->|logs| B
C -->|logs| B
B -->|log query| D
E -->|logs| D
F -->|logs| D
D -->|formatted logs| G
D -->|gzipped logs| H
H -->|gzipped logs| I
I -->|API| J[Cloud Logging Services]
@deitch isn't this diagram sufficient? docs/images/vector.drawio.png
can we build it instead of pulling it?
@shjala yes, I will do it in the next iteration
Is there anyway configure it to access the API endpoint over UDS?
no, the API is just for metrics and similar stuff (and I removed it because we expose metrics through a prometheus exporter)
go tests fail because the package eve-dom0-ztest is not yet published to Docker Hub and can only be found in linuxkit cache, while make test builds using docker, so it's looking for eve-dom0-ztest in docker images
go tests fail because the package eve-dom0-ztest is not yet published to Docker Hub and can only be found in linuxkit cache, while
make testbuilds using docker, so it's looking foreve-dom0-ztestin docker images
I hope @christoph-zededa has an idea about it
go tests fail because the package eve-dom0-ztest is not yet published to Docker Hub and can only be found in linuxkit cache, while
make testbuilds using docker, so it's looking foreve-dom0-ztestin docker imagesI hope @christoph-zededa has an idea about it
With the help from @europaul I have this: https://github.com/lf-edge/eve/pull/5027/commits/06f12f3045d4253c879fba1dd19185f238db669b
I managed to build both x86 and arm64 versions of mini-vector. Here is how I did it:
- Clone the original vector repo
- Install the
crosstool like it's done in Vector's workflows - Change
Cargo.tomlfile to use the following compile flags:
[profile.release]
opt-level = "z"
debug = false
strip = true
lto = true
codegen-units = 1
and only the necessary features:
target-aarch64-unknown-linux-musl = [
"sources-socket",
"sources-internal_metrics",
"transforms-logs",
"sinks-socket",
"sources-prometheus-scrape",
"sinks-prometheus",
]
target-x86_64-unknown-linux-musl = [
"sources-socket",
"sources-internal_metrics",
"transforms-logs",
"sinks-socket",
"sources-prometheus-scrape",
"sinks-prometheus",
]
- Run
make build-x86_64-unknown-linux-muslandmake build-aarch64-unknown-linux-muslto generate vector binaries - Add this Dockerfile:
FROM scratch AS target-amd64
ENV CARGO_BUILD_TARGET="x86_64-unknown-linux-musl"
FROM scratch AS target-arm64
ENV CARGO_BUILD_TARGET="aarch64-unknown-linux-musl"
FROM scratch AS target-riscv64
ENV CARGO_BUILD_TARGET="riscv64gc-unknown-linux-gnu"
FROM target-$TARGETARCH AS toolchain
COPY target/$CARGO_BUILD_TARGET/release/vector /usr/bin/vector
FROM alpine:3.21 AS runtime
COPY --from=toolchain /usr/bin/vector /usr/bin/vector
- Run
docker buildx build --platform=linux/amd64,linux/arm64 -t paulzededa/eve-vector:0.0.5 --push .to build and push the vector base images to docker hub
Big thanks to @rene for helping figure out how to do the cross-compilation!
This is of course very hacky, but I haven't found a way yet to dockerize the build. cross needs to have access to docker and I don't think we can use docker in docker in our CI.
@rene I think the best way for now would be to really fork vector's repo, do the patching like described above and produce vector base images, that we'll later use from this vector package.
@OhmSpectator I think this PR is ready to merge. I noted the following action items to be address in a follow up PR:
- find a way to dockerize the build or create a fork repo for vector-base
- address the Fatals - vector shouldn't fail like this and bring the system down (probably a good way is to auto-restart vector from the container's entrypoint)
- add AppArmor profile
- see if something can be done about the image size
The ARM builds keep failing due to 429 Too Many Requests...
- find a way to dockerize the build or create a fork repo for vector-base
- address the Fatals - vector shouldn't fail like this and print the system down (probably a good way is to auto-restart vector from the container's entrypoint)
- add AppArmor profile
- see if something can be done about the image size
I like the plan. Did you store it somewhere else? It would be useful to have it in our backlog so that others can observe it.
Can power failure result in truncated/corrupted files for vector? One possible place is the config file, but I don't know if vector writes and reads to other files in /persist.
@eriknordmark I think you got the only one, thanks! The other ones are just copying files around at the startup, so in case of power failure they will just try again at the next startup - no info lost, the corrupt files will be overwritten.
@eriknordmark @OhmSpectator please have another look, I think I addressed all the comments
Could you please rebase it on master?
Could you please provide end-to-end instruction for the verification team on how it's to be used by users? Or do we have already added Eden tests for that? I want to see a scenario of user provides custom transformation, uploads it and then sees the results. End to end.
Could you please provide end-to-end instruction for the verification team on how it's to be used by users? Or do we have already added Eden tests for that? I want to see a scenario of user provides custom transformation, uploads it and then sees the results. End to end.
@OhmSpectator I added end-to-end integration tests in Eden like you requested. https://github.com/lf-edge/eden/pull/1083
that was actually very useful since it helped me discover and fix a couple of bugs in this PR :) thank you very much for being persistent :heart:
Could you please provide end-to-end instruction for the verification team on how it's to be used by users? Or do we have already added Eden tests for that? I want to see a scenario of user provides custom transformation, uploads it and then sees the results. End to end.
@OhmSpectator I added end-to-end integration tests in Eden like you requested. lf-edge/eden#1083
that was actually very useful since it helped me discover and fix a couple of bugs in this PR :) thank you very much for being persistent ❤️
I'm glad it helps =) Question. Regarding the PR into Eden. Will the new Eden tests start automatically as part of our Eden workflow? Or should we add them manually?
Question. Regarding the PR into Eden. Will the new Eden tests start automatically as part of our Eden workflow? Or should we add them manually?
I'd say they should run automatically.
I cannot get the Nvidia build done for this PR... I tried a lot... Do we have to address it, @rene, @rucoder?...
I cannot get the Nvidia build done for this PR... I tried a lot... Do we have to address it, @rene, @rucoder?...
I guess the problem with a cross compiler setup for those platforms. @europaul did you consider them in the Dockerfile ?
I cannot get the Nvidia build done for this PR... I tried a lot... Do we have to address it, @rene, @rucoder?...
I guess the problem with a cross compiler setup for those platforms. @europaul did you consider them in the Dockerfile ?
I only built vector for x86_64-unknown-linux-musl and aarch64-unknown-linux-musl. Do I need to build for another triple as well?
I can just say, that it's not a problem of runners: The same runner-16 works here: https://github.com/lf-edge/eve/actions/runs/16373320810/job/46376212308?pr=5008 and does not work here: https://github.com/lf-edge/eve/actions/runs/16373320810/job/46318963950?pr=5008
When it stucks, I see it stucks here:
#10 [build 5/6] RUN GO111MODULE=on CGO_ENABLED=0 go build -ldflags "-s -w -X=main.Version=v0.0.0-20250718143818-f69f29f8851a
" -mod=vendor -o /out/usr/bin/newlogd ./cmd
Error: The operation was canceled.
So, it's newlogd build.