fluentd
fluentd copied to clipboard
In tail plugin, position data in pos_file often was deleted? disappear?
Describe the bug
The postition information of each file in pos_file disappears. And only empty pos_file remain.
After that, several phenomena occur. Such as "Unparsable line in pos_file: 000000000796950c" in fluentd log, or position information is added for files that have already been tailed, and it starts to tail again from the beginning of file.
To Reproduce
fluentd-conf.yaml
/fluentd/source-data/ is NFS mounted server directory
<source>
@type tail
path "/fluentd/source-data/#{hostname}__*.log"
pos_file "/fluentd/source-data/pos_files/#{hostname}.pos"
refresh_interval 5s
follow_inodes true
skip_refresh_on_startup true
read_from_head true
read_lines_limit 10000
tag tag
pos_file_compaction_interval 1s
<parse>
@type json
</parse>
</source>
Expected behavior
pos_file only compacts(or delete) position data when tracked file deleted or position data of duplicated file appended.
Your Environment
- Fluentd version: fluentd:v1.15.3-debian-1.1
- Operating system: PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
- Kernel version: Linux 4.15.0-207-generic
Your Configuration
fluentd-conf.yaml
/fluentd/source-data/ is NFS mounted server directory
<source>
@type tail
path "/fluentd/source-data/#{hostname}__*.log"
pos_file "/fluentd/source-data/pos_files/#{hostname}.pos"
refresh_interval 5s
follow_inodes true
skip_refresh_on_startup true
read_from_head true
read_lines_limit 10000
tag tag
pos_file_compaction_interval 1s
<parse>
@type json
</parse>
</source>
Your Error Log
2023-03-18 15:11:06 +0900 [warn]: #0 Unparsable line in pos_file: 000000000761339b
2023-03-18 15:11:08 +0900 [info]: #0 Clean up the pos file
2023-03-18 15:11:08 +0900 [warn]: #0 Unparsable line in pos_file: 0000000006ed6a27
2023-03-18 15:11:09 +0900 [info]: #0 Clean up the pos file
2023-03-18 15:11:09 +0900 [warn]: #0 Unparsable line in pos_file: 000000000726ca85
2023-03-18 15:11:09 +0900 [info]: #0 Clean up the pos file
2023-03-18 15:11:11 +0900 [info]: #0 Clean up the pos file
2023-03-18 15:11:11 +0900 [warn]: #0 Unparsable line in pos_file: 000000000796950c
2023-03-18 15:11:11 +0900 [info]: #0 Clean up the pos file
2023-03-18 15:11:12 +0900 [info]: #0 Clean up the pos file
Additional context
No response
even if I changed pos_file location to local storage (not nfs) same phenomena occurred 🥲
Somehow the pos_file seems to be broken.
Please tell me how to reproduce it in local storage in more detail. Does this happen suddenly while Fluentd is running? Can you check the content of the pos_file just before it is cleaned up and tell me?
@daipom
Please tell me how to reproduce it in local storage in more detail.
There are two conditions are necessary to reproduce.
First one is path must be a folder on NFS mounted server,
and second one is to set a small value to pos_file_compaction_interval.
(when I set large value like 12h to pos_file_compaction_interval other errors occurred.)
I think the key is NFS, there is an issue when fluentd checks the structure and files inside the folder(path) on the NFS mounted server.
Does this happen suddenly while Fluentd is running?
Yes.
Can you check the content of the pos_file just before it is cleaned up and tell me?
When I checked the content of the pos_file before cleaned up, there is no strange formats .
@SML0127
Thanks for the very helpful information!
I suspect some race conditions occurred in updating the pos_file.
I will check for possible conflicting processes in in_tail.
I have heard before that we should not put pos_file on NFS. I don't know the detailed reason for this, but I hope this information will help improve it!
even if I changed pos_file location to local storage (not nfs) same phenomena occurred
I was wondering if this phenomenon happens on local storage as well.
First one is
pathmust be a folder on NFS mounted server,
But, as a result, does this happen only on NFS?
If so, it could be due to differences in disk write speeds, file system flushing timing, etc...
Until the cause of this problem is found and fixed, is it possible to work around this issue by setting longer pos_file_compaction_interval or putting pos_file on local storage?
@daipom Thank you for your kind reply!
Since I'm not sure of the root cause of this issue and whether it work's properly on local system,
I'm thinking of not using the tail plugin
I have one question, is there any guideline in the fluentd documentation that do not use NFS to path of tail plugin?
@SML0127
Environmental-specific problems are pointed out for in_tail for a while. I'm planning to make major improvements so that it can operate stably in a variety of environments.
I have one question, is there any guideline in the fluentd documentation that do not use NFS to path of tail plugin?
I don't think there is any such guideline.
I guess that's because we don't know the environment or settings that won't work for sure.
At least it seems to me that pos_file is not designed to be placed on NFS.
So it is certainly better to have such a guideline.