vector
vector copied to clipboard
Support ingesting data from named pipes
Current Vector Version
vector 0.11.0 (gbffecc4 x86_64-unknown-linux-gnu 2020-09-14)
Use-cases
(Copied from discord: https://discordapp.com/channels/742820443487993987/746070591097798688/755150918969982986)
The user has a setup where logs are written to a dynamic set of named pipes which they would like to ingest with vector.
Attempted Solutions
We originally thought that the file source might be able to handle this, but it appears not, due to the checkpointing requiring seeks:
Sep 14 16:03:46.828 INFO vector: Log level "info" is enabled.
Sep 14 16:03:46.835 INFO vector: Loading configs. path=["/tmp/fifo.toml"]
Sep 14 16:03:46.868 INFO vector::topology: Running healthchecks.
Sep 14 16:03:46.868 INFO vector::topology: Starting source "my_source_id"
Sep 14 16:03:46.868 INFO vector::topology::builder: Healthcheck: Passed.
Sep 14 16:03:46.869 INFO vector::topology: Starting sink "console"
Sep 14 16:03:46.869 INFO vector: Vector has started. version="0.11.0" git_version="v0.9.0-677-gbffecc4" released="Mon, 14 Sep 2020 15:46:35 +0000" arch="x86_64"
Sep 14 16:03:46.869 INFO source{name=my_source_id type=file}: vector::sources::file: Starting file server. include=["/tmp/my_pipe"] exclude=[]
Sep 14 16:03:52.868 ERROR source{name=my_source_id type=file}:file_server: vector::internal_events::file: failed reading file for fingerprinting. path="/tmp/my_pipe" error=Os { code: 29, kind: Other, message: "Illegal seek" }
The error is output when the pipe is written to. Additionally, vector hangs when trying to shut down.
The config:
[sources.my_source_id]
# General
type = "file" # required
data_dir = "/tmp/vector" # optional, no default
include = ["/tmp/my_pipe"]
[sinks.console]
inputs = ["my_source_id"]
type = "console"
encoding.codec = "json"
Vector is able to use it as a stdin source, but this will only work for one pipe per vector instance.
Proposal
Not sure! We could extend the file source, the stdin source, or add a new one to handle this depending on feasibility.
I'm in need of this as well. It seems that the behavior of the source used for named pipes would be similar to the file descriptor source, except that:
- When the specified path does not exist, vector should create the FIFO
- When the writer closes the pipe, vector should reopen it
I think this is sufficiently different from both the file and the file descriptor source, so a new "Named Pipe" source might be best.
Would a PR with this have a chance to be merged? Other ideas or suggestions?
I bumped into this when using the file source to ingest Nomad logs. Nomad creates named pipes, which Vector tries to seek.
This blows up at runtime with:
thread 'vector-worker' panicked at lib/file-source/src/file_watcher/mod.rs:124:67:
called `Result::unwrap()` on an `Err` value: Os { code: 29, kind: NotSeekable, message: "Illegal seek" }
I worked around it using the exclude option.