vector icon indicating copy to clipboard operation
vector copied to clipboard

File source after deletion and writing again not working properly

Open skapix opened this issue 2 years ago • 1 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

After file removal and immediate create and write again, vector may keep file for delition and does not see any new input for this file.

Way to reproduce the bug with the configuration, attached below

  1. Go to working directory: cd /tmp
  2. Simulate rotating in bash. We're going to do it with file removal. for i in {1..101}; do echo "abc: $i" >> 1.txt; done; rm 1.txt; for i in {1..10100}; do echo "cde: $i" >> 1.txt; done;
  3. check if file is not released by vector:
lsof -nP +L1
COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF NLINK     NODE NAME
vector    413099      root    9r   REG  259,2   100795     0  8126906 /tmp/1.txt (deleted)

This is the case when vector did not release the file.

If you see no held files, try again from step 2. Usually we need 2-3 attempts to get file not released by vector.

After this happens, vector does not see any input from this file:

$ tail -n 1 vector-1234.log 
{"file":"/tmp/1.txt","host":"host","message":"abc: 101","source_type":"file","timestamp":"2023-01-10T10:52:36.667782990Z"}

Note, this is the last line before file was dropped. No data from new file is presented in output log

Configuration

sources:
  post_dummy_source:
    type: file
    include:
      - /tmp/*.txt
sinks:
  my_sink_id:
    type: file
    inputs:
      - post_*
    compression: none
    encoding:
      codec: json
    path: /tmp/vector-1234.log


### Version

vector 0.26.0 (x86_64-unknown-linux-gnu c6b5bc2 2022-12-05)

### Debug Output

_No response_

### Example Data

_No response_

### Additional Context

The issue happens for local vector and k8s vector

### References

_No response_

skapix avatar Jan 10 '23 11:01 skapix

What a timing. This morning we just experienced a "too many open files" OS error coming from Vector on a file-based pipeline that also is supposed to be deleting old files. We discovered that indeed Vector did not delete all the files it should and exhausted the open files limit of the OS. The end effect is that lots of old files were left on the server and the disk started to fill up.

hhromic avatar Jan 10 '23 14:01 hhromic

I can only reproduce this issue if I run the same for i in {1..101}; do echo "abc: $i" >> 1.txt; done; rm 1.txt; for i in {1..10100}; do echo "cde: $i" >> 1.txt; done; multiple times.

If I run this command:

for i in {1..101}; do  echo "abc: $(date +%s.%N)" >> 1.txt; done; rm 1.txt; for i in {1..101}; do  echo "cde: $(date +%s.%N)" >> 1.txt; done;

this problem does not seem to occur, since no two files will ever have the same fingerprint.

It's likely the problem is caused because

  • Vector is tailing a file - first line cde: 1.
  • We append abc: .. onto this file. This file is deleted.
  • We create a new file - first line cde: 1.

Since the fingerprint hasn't changed, Vector doesn't realise that this is a new file so it keeps the file handle to the deleted file open.

I'm not sure if there is a solution here. Real world cases should not be creating files with the same fingerprint.

StephenWakely avatar Mar 07 '23 10:03 StephenWakely

What a timing. This morning we just experienced a "too many open files" OS error coming from Vector on a file-based pipeline that also is supposed to be deleting old files. We discovered that indeed Vector did not delete all the files it should and exhausted the open files limit of the OS. The end effect is that lots of old files were left on the server and the disk started to fill up.

@hhromic Is it possible you are creating multiple files with the same fingerprint?

StephenWakely avatar Mar 07 '23 10:03 StephenWakely

Hi @StephenWakely, thanks for investigating this! Your findings are quite interesting.

Is it possible you are creating multiple files with the same fingerprint?

We are using the following configuration:

    fingerprint:
      strategy: checksum
      lines: 7
      ignored_header_bytes: 5

However, we are processing gzipped files and as I reported in #13193, the checksum fingerprinter has issues on compressed data. After what you reported, now I'm thinking that this issue (for us) might be related to the one I reported for gzipped data.

Is there a way for me to verify that Vector is seeing the same fingerprints to answer your question?

hhromic avatar Mar 07 '23 18:03 hhromic

I just realised that the entity deleting the files in our case is Vector itself (configured to do so in the file source). In the issue reported here and the reproduction case being used, the entity deleting files is external to Vector.

Therefore, it seems our issue is actually different than the one in this report :(.

hhromic avatar Mar 07 '23 19:03 hhromic

This issue happens indeed due to the fingerprint.

We have similar issue in our production environment. Will check whether it happens due to the checksum equality. If not, will try to create a new ticket to simulate our production environment issue.

Thank you @StephenWakely!

skapix avatar Mar 09 '23 11:03 skapix