vector icon indicating copy to clipboard operation
vector copied to clipboard

Only warn about small files when they are not empty

Open binarylogic opened this issue 4 years ago • 5 comments

This is a follow up to https://github.com/timberio/vector/pull/863#issuecomment-544314174.

To recap, when the file source is configured to use the checksum fingerprinting strategy a warning log message is emitted letting the user know the file is too small to generate a fingerprint:

warn!(message = "Ignoring file smaller than fingerprint_bytes", file = ?path);

This is a very welcome feature for files that are actually too small. It helps to prevent confusion around why small files aren't being tailed. Unfortunately, it can produce a lot of noise in production environments where many empty files are present.

I'm unsure if we should ignore empty files, given that it could lead to the same confusion that we experienced previously with small files. Then again, I can see the reasoning behind ignoring empty files. I'm opening this up for discussion so we can come to a conclusion about what to do in this scenario.

binarylogic avatar Oct 21 '19 19:10 binarylogic

@karlseguin do you still feel strongly about this? I'm on the fence, so I'm leaning towards closing this. If this is causing you a lot of pain/noise we can probably implement a quick fix.

binarylogic avatar Feb 11 '20 18:02 binarylogic

Short: I don't feel strongly about it, no. Feel free to close.

Long:

Not related specifically to this issue, but in general I've found the file source with small files difficult to use. Either because of the above, or because it warns a lot about duplicate files. For example, we run Lynis daily and grep for warnings into a file. We might end up with something like:

lynis.error = "[warning] system needs a reboot"
lynis.error.1 = "[warning] system needs a reboot"
lynis.error.2 = "[warning] system needs a reboot"

I think at best this generates warnings, and at worse might lead to some messages being skipped (because it thinks it's duplicates).

I say i think that's what happens because we've largely solved this by doing one of three things, so I don't really remember the exact behavior. We do:

1 - Use the journal source where possible 2 - I've generate extra noise for these very short things (like adding the date to the start of the output) 3 - I've moved to device_and_inode for anything else

Even less related, but a bigger issue is: https://github.com/timberio/vector/issues/1391 which has forced me to just ignore vector logs altogether. Now, that's very lazy of me, as I could just use vector to filter out those messages. I might feel more strongly about questionable use of WARN and ERROR levels if it wasn't for this issue.

karlseguin avatar Feb 12 '20 00:02 karlseguin

2023-07-24T22:13:28.614435Z  WARN source{component_kind="source" component_id=kube_logs component_type=kubernetes_logs component_name=kube_logs}:file_server: vector::internal_events::file::source: Currently ignoring file too small to fingerprint. file=/var/log/pods/longhorn-system_engine-image-ei-74783864-5wpcs_18b56f61-37a5-4982-b99a-69204c0848a2/engine-image-ei-74783864/2.log
2023-07-24T22:13:28.614605Z  WARN source{component_kind="source" component_id=kube_logs component_type=kubernetes_logs component_name=kube_logs}:file_server: vector::internal_events::file::source: Currently ignoring file too small to fingerprint. file=/var/log/pods/longhorn-system_engine-image-ei-74783864-5wpcs_18b56f61-37a5-4982-b99a-69204c0848a2/engine-image-ei-74783864/3.log

Nello-Angelo avatar Jul 24 '23 22:07 Nello-Angelo