File Source: Unicode Null Characters
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
.message fields contains a lot of "\u0000", before the real line contains.
Configuration
sources:
t_log:
type: "file"
include: ["/***********log.tmp.log"]
line_delimiter: "\n"
glob_minimum_cooldown: 60000 # 1 min
transforms:
t_log_to_json:
type: "remap"
inputs: ["t_log"]
source: ".=parse_json!(.message)"
sinks:
t_log_to_clickhouse:
type: "clickhouse"
inputs: ["t_log_to_json"]
endpoint: "*******"
database: "t_log"
table: "{{ __ch_table }}"
skip_unknown_fields: true
auth:
// ......
Version
see on 0.33 and 0.35
Debug Output
// too random to find log :/
Example Data
{"__ch_table":"metric_log","datetime":"2024-01-09 09:59:59","context_id":"1161bde682c4ed13f968000000000000","instance_id":0000,"instance_name":"eeeeeeeeeeee","method":"GET","path":"\/escaped\/path\/the\/hell\/currents","query":"","memory_usage":1123840,"nb_select":11,"nb_insert":0,"nb_update":0,"nb_delete":0,"nb_other":0,"nb_resort_select":0,"nb_resort_insert":0,"nb_resort_update":0,"nb_resort_delete":0,"nb_resort_other":0,"nb_common_select":0,"nb_common_insert":0,"nb_common_update":0,"nb_common_delete":0,"nb_common_other":0}
Additional Context
I have 2 serveur that ouput their log in the same NFS storage. I use a third server to run vector that read the data from the NFS storage.
Someone else having the same issue on the discord, but this guy using a S3 storage instead a NFS. This issues look like a issue about network storage :/
References
https://discord.com/channels/742820443487993987/1194253312212615219 https://discord.com/channels/742820443487993987/1149066915923365898
From Discord :
jches: Skimming some old NFS mailing list posts, it sounds like this (reading null bytes from a file) is just something that can happen if the file is being read while it's open for writing. You'll probably need to take a similar approach as in that thread, so vector is only reading files that aren't being written to anymore https://www.spinics.net/lists/linux-nfs/msg49803.html
So this not a 'bug' in vector ...
But the file sink need to have a option to simply ignore the fact he read the last line if the line start by Null ...
you can try with this change for similar issue, on v0.38.0 sources https://github.com/tamer-hassan/vector/commit/518d4e17db2a698491cc3927df39de676d7ef523 I built only for windows since this is what I was concerned with.