vector
vector copied to clipboard
Only warn about small files when they are not empty
This is a follow up to https://github.com/timberio/vector/pull/863#issuecomment-544314174.
To recap, when the file
source is configured to use the checksum
fingerprinting strategy a warning log message is emitted letting the user know the file is too small to generate a fingerprint:
warn!(message = "Ignoring file smaller than fingerprint_bytes", file = ?path);
This is a very welcome feature for files that are actually too small. It helps to prevent confusion around why small files aren't being tailed. Unfortunately, it can produce a lot of noise in production environments where many empty files are present.
I'm unsure if we should ignore empty files, given that it could lead to the same confusion that we experienced previously with small files. Then again, I can see the reasoning behind ignoring empty files. I'm opening this up for discussion so we can come to a conclusion about what to do in this scenario.
@karlseguin do you still feel strongly about this? I'm on the fence, so I'm leaning towards closing this. If this is causing you a lot of pain/noise we can probably implement a quick fix.
Short: I don't feel strongly about it, no. Feel free to close.
Long:
Not related specifically to this issue, but in general I've found the file source with small files difficult to use. Either because of the above, or because it warns a lot about duplicate files. For example, we run Lynis daily and grep for warnings into a file. We might end up with something like:
lynis.error = "[warning] system needs a reboot"
lynis.error.1 = "[warning] system needs a reboot"
lynis.error.2 = "[warning] system needs a reboot"
I think at best this generates warnings, and at worse might lead to some messages being skipped (because it thinks it's duplicates).
I say i think that's what happens because we've largely solved this by doing one of three things, so I don't really remember the exact behavior. We do:
1 - Use the journal source where possible 2 - I've generate extra noise for these very short things (like adding the date to the start of the output) 3 - I've moved to device_and_inode for anything else
Even less related, but a bigger issue is: https://github.com/timberio/vector/issues/1391 which has forced me to just ignore vector logs altogether. Now, that's very lazy of me, as I could just use vector to filter out those messages. I might feel more strongly about questionable use of WARN and ERROR levels if it wasn't for this issue.
2023-07-24T22:13:28.614435Z WARN source{component_kind="source" component_id=kube_logs component_type=kubernetes_logs component_name=kube_logs}:file_server: vector::internal_events::file::source: Currently ignoring file too small to fingerprint. file=/var/log/pods/longhorn-system_engine-image-ei-74783864-5wpcs_18b56f61-37a5-4982-b99a-69204c0848a2/engine-image-ei-74783864/2.log
2023-07-24T22:13:28.614605Z WARN source{component_kind="source" component_id=kube_logs component_type=kubernetes_logs component_name=kube_logs}:file_server: vector::internal_events::file::source: Currently ignoring file too small to fingerprint. file=/var/log/pods/longhorn-system_engine-image-ei-74783864-5wpcs_18b56f61-37a5-4982-b99a-69204c0848a2/engine-image-ei-74783864/3.log