vector icon indicating copy to clipboard operation
vector copied to clipboard

Curly braces in the include directive, file pattern

Open Tarasovych opened this issue 3 years ago • 9 comments

Vector Version

# vector --version
vector 0.10.0 (g0f0311a x86_64-unknown-linux-gnu 2020-07-22)

Vector Configuration File

[sources.nginx]
  type = "file"
  include = ["/var/log/nginx/{site1.example.com,site2.example.com,site3.example.com}.access.log"]

Debug Output

INFO source{name=nginx type=file}: vector::sources::file: Starting file server. include=["/var/log/nginx/{site1.example.com,site2.example.com,site3.example.com}.access.log"] exclude=[]

Expected Behavior

Vector is able to watch all three files.

Actual Behavior

Vector could not watch those files.

Additional Context

ls -l /var/log/nginx/{site1.example.com,site2.example.com,site3.example.com}.access.log

works just fine.

[sources.nginx]
  type = "file"
  include = ["/var/log/nginx/site1.example.com.access.log", "/var/log/nginx/site2.example.com.access.log", "/var/log/nginx/site3.example.com.access.log"]

also works fine:

INFO source{name=nginx type=file}:file_server: file_source::file_server: Found file to watch. path="/var/log/nginx/site1.example.com.access.log" file_position=0
INFO source{name=nginx type=file}:file_server: file_source::file_server: Found file to watch. path="/var/log/nginx/site2.example.com.access.log" file_position=0
INFO source{name=nginx type=file}:file_server: file_source::file_server: Found file to watch. path="/var/log/nginx/site3.example.com.access.log" file_position=0

Is it possible to use {} in the include directive or am I missing something?

Tarasovych avatar Nov 12 '20 09:11 Tarasovych

It looks like this is an issue with the upstream glob crate we are using: https://github.com/rust-lang-nursery/glob/issues/2 . That crate doesn't seem particularly maintained anymore given its age, last commit, last release, and number of stale issues.

globset (https://crates.io/crates/globset) seems more active and does support the brace syntax.

jszwedko avatar Nov 12 '20 20:11 jszwedko

@jszwedko Looks close to a drop in ish replacement too.

jamtur01 avatar Nov 12 '20 20:11 jamtur01

I don't think it's a replacement for us, unfortunately. From the readme comparison with glob:

Doesn't provide a recursive directory iterator of matching file paths

That's the exact function that we use in the file source.

lukesteensen avatar Nov 13 '20 02:11 lukesteensen

Yeah, that is an unfortunate gap. I do see some discussion around that in https://github.com/BurntSushi/ripgrep/pull/765

From there I found a link to https://github.com/gilnaa/globwalk which might be another candidate.

More discussion around glob: https://github.com/rust-lang-nursery/glob/issues/59

jszwedko avatar Nov 13 '20 14:11 jszwedko

globwalk looks good! I think we could go ahead and swap that in.

lukesteensen avatar Nov 14 '20 01:11 lukesteensen

I'm blocking this issue on https://github.com/rust-lang-nursery/glob/issues/59#issuecomment-762673096 due to https://github.com/timberio/vector/pull/5927#issuecomment-757759659.

pablosichert avatar Jan 19 '21 08:01 pablosichert

@pablosichert thanks. We should sync up and identify the exact upstream issues before working on them. It's possible this might not be worth the effort.

binarylogic avatar Jan 19 '21 15:01 binarylogic

Alternatively, there is libc::glob which we could use on Unix. Haven't found anything comparable for Windows yet.

Edit: Nevermind, that one doesn't support patterns to recursively descend directories, like ./**/foo.txt.

pablosichert avatar Jan 19 '21 16:01 pablosichert

Blocking on that glob issue seems like the right call to me. There doesn't seem to be one clear place we can easily contribute to, and I don't think that this feature is worth the investment of striking out on our own.

lukesteensen avatar Jan 20 '21 17:01 lukesteensen

Closing in-lieu of https://github.com/vectordotdev/vector/issues/5936

jszwedko avatar Dec 27 '22 23:12 jszwedko