loki
loki copied to clipboard
Promtail: Add compressed files support
What this PR does / why we need it: Adds to Promtail the ability to read compressed files. It works by:
- Infer which compression format to use based on the file extension
- Uncompress the file with the native
golang/compress
packages - Iterate over uncompressed lines and send them to Loki
Its usage is the same as our current file tailing. In the example below all files under logfiles
folder are parsed/processed, regardless of being compressed or not. file1
and file2
are first uncompressed with Gunzip; file3 is parsed as it is.
scrape_configs:
- job_name: simplejob
static_configs:
- targets: [localhost]
labels:
job: compressedfolder
__path__: /logfiles/**.*
$ ls /logfiles
file1.tar.gz file2.tar.gz file3.txt
Which issue(s) this PR fixes: Fixes #5956
Special notes for your reviewer: Another approach is to infer the compression format based on the header bytes but this approach is much simpler; in the future we can improve it if we think the community needs it.
Checklist
- [ ] Documentation added
- [ ] Tests updated
- [ ] Is this an important fix or new feature? Add an entry in the
CHANGELOG.md
. - [ ] Changes that require user attention or interaction to upgrade are documented in
docs/sources/upgrading/_index.md
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
@dannykopping @frittentheke thank you both for the fantastic reviews, I'll be working on those suggestions and other ones that might come along the week and I plan to try testing a few things with this too.
Just one thing, I think even if this doesn't add decent support for compressed log rotations, we should still proceed to at least have this, as a simpler implementation that is to be improved in a follow-up PR (I can start working on it exactly after this gets merged). My argument is that as of now Loki isn't supporting even the most straightforward compression scenarios, so this would at least add some support without a lot of burdens. For instance, I liked that with this PR you can now tell Promtail to scrape from a folder that has tons of compressed files and that you can keep appending new compressed files to it and all of them will be scraped.
For instance, I liked that with this PR you can now tell Promtail to scrape from a folder that has tons of compressed files and that you can keep appending new compressed files to it and all of them will be scraped.
Yes. Question is though, in what order those files will be picked up? Does Promtail sort them by modification date? That would then at least keep as much order as possible.
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
Yes. Question is though, in what order those files will be picked up? Does Promtail sort them by modification date? That would then at least keep as much order as possible.
Good call, thanks. I'll add this scenario to the list of things I'll be testing this week. Of course, even if I implement some kind of sorting there will be still edge cases because the managers will be checking things async but it would for sure be more stable than not having any sorting (which I'm not sure we have as of now).
FYI: I did generate a file with ~400 MB, and compressed it with gunzip which reduced it to ~1 MB, and apparently the gunzip reader didn't really expand the whole file before iterating over it, which is great. I'll try one more time but this time with a bigger file (a few gbs).
Yes. Question is though, in what order those files will be picked up? Does Promtail sort them by modification date? That would then at least keep as much order as possible.
Good call, thanks. I'll add this scenario to the list of things I'll be testing this week. Of course, even if I implement some kind of sorting there will be still edge cases because the managers will be checking things async but it would for sure be more stable than not having any sorting (which I'm not sure we have as of now).
I've tested/investigated it and no, Promtail doesn't follow any order. I tried to look for a quick solution to it but tbh it won't be that easy because:
- The discovery is based on the filename/glob
- Once the files to be tailed are discovered, the target manager iterates over them and sends them to a channel
- They're processed as they're received on the channel
We can try sorting them by name before publishing them on the channel, but ordering them by creation date looks to be a challenge.
EDIT: nvm, I can actually read the creation/modification date from the name and sort things before hand
EDIT2: even after sorting the targets, they might be synced out of order because tailing is completely async. I implemented a solution and wrote a few unit tests but it will fail/is failing because tailing itself is async, so even if I start tailing on the right order, they might still process things in a different order.
@dannykopping fyi: I finished the tests I was interested, anything else you'd like me to test/improve?
We can try sorting them by name before publishing them on the channel, but ordering them by creation date looks to be a challenge.
EDIT: nvm, I can actually read the creation/modification date from the name and sort things before hand
EDIT2: even after sorting the targets, they might be synced out of order because tailing is completely async. I implemented a solution and wrote a few unit tests but it will fail/is failing because tailing itself is async, so even if I start tailing on the right order, they might still process things in a different order.
This seems to be a larger issue of only using file names and not the abstract inodes number and the modification date. See me comment about this at https://github.com/grafana/loki/issues/6503#issuecomment-1191957798
So this might be a fix / improvement for another day. But since Loki does not really loves older timestamps being ingested Promtail should do what is can to read from files targeting the same labels in proper order ...
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
FYI: I did generate a file with ~400 MB, and compressed it with gunzip which reduced it to ~1 MB, and apparently the gunzip reader didn't really expand the whole file before iterating over it, which is great. I'll try one more time but this time with a bigger file (a few gbs).
Can we add a benchmark for these so the tests are repeatable, and we can see what the CPU/memory impact is?
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
FYI: I did generate a file with ~400 MB, and compressed it with gunzip which reduced it to ~1 MB, and apparently the gunzip reader didn't really expand the whole file before iterating over it, which is great. I'll try one more time but this time with a bigger file (a few gbs).
Can we add a benchmark for these so the tests are repeatable, and we can see what the CPU/memory impact is?
I'm looking at it but it is more complicated than I was thinking. Out of curiosity, would you say I should generate a big file, compress it and bench everything programmatically, or were you thinking something different?
FYI: I did generate a file with ~400 MB, and compressed it with gunzip which reduced it to ~1 MB, and apparently the gunzip reader didn't really expand the whole file before iterating over it, which is great. I'll try one more time but this time with a bigger file (a few gbs).
Can we add a benchmark for these so the tests are repeatable, and we can see what the CPU/memory impact is?
I'm looking at it but it is more complicated than I was thinking. Out of curiosity, would you say I should generate a big file, compress it and bench everything programmatically, or were you thinking something different?
Out of curiosity, would you say I should generate a big file, compress it and bench everything programmatically
Yeah this is what I'd love to see.
As an aside, I went looking in the Go source, and I was surprised to not find benchmarks in the GZIP pkg, although there are a couple:
compress [tags/go1.18.5] $ grep -rn Benchmark
flate/writer_test.go:16:func BenchmarkEncode(b *testing.B) {
flate/reader_test.go:34:func BenchmarkDecode(b *testing.B) {
lzw/writer_test.go:199:func BenchmarkEncoder(b *testing.B) {
lzw/reader_test.go:267:func BenchmarkDecoder(b *testing.B) {
bzip2/bzip2_test.go:238:func BenchmarkDecodeDigits(b *testing.B) { benchmarkDecode(b, digits) }
bzip2/bzip2_test.go:239:func BenchmarkDecodeNewton(b *testing.B) { benchmarkDecode(b, newton) }
bzip2/bzip2_test.go:240:func BenchmarkDecodeRand(b *testing.B) { benchmarkDecode(b, random) }
What I was thinking we could do is:
- add a text file to the test fixtures which is sufficiently big but highly compressible (a couple MB of server logs?)
- in the benchmark, compress it and save to a file
- start the benchmark timer and begin uncompressing the file
- take note of allocs, MB/s, etc
We may see that one or two compression formats cannot be partially uncompressed and lead to unbounded memory growth, in which case we'll need to find an alternative to the stdlib packages, or put a big warning in the docs for that format.
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0.4%
+ loki 0%
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki
Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.
+ ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0%
Benchmark results with 3MB buffer size for decompression:
Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -tags integration,requires_docker -bench ^BenchmarkReadlines$ github.com/grafana/loki/clients/pkg/promtail/targets/file
goos: linux
goarch: amd64
pkg: github.com/grafana/loki/clients/pkg/promtail/targets/file
cpu: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
BenchmarkReadlines/2000_lines_of_log_.tar.gz_compressed-12 768 1559020 ns/op 3063606 B/op 239 allocs/op
BenchmarkReadlines/100000_lines_of_log_.gz_compressed-12 18 63992058 ns/op 5666200 B/op 21157 allocs/op
PASS
ok github.com/grafana/loki/clients/pkg/promtail/targets/file 3.221s
Benchmark results with 4096 buffer size:
Running tool: /usr/local/go/bin/go test -benchmem -run=^$ -tags integration,requires_docker -bench ^BenchmarkReadlines$ github.com/grafana/loki/clients/pkg/promtail/targets/file
goos: linux
goarch: amd64
pkg: github.com/grafana/loki/clients/pkg/promtail/targets/file
cpu: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
BenchmarkReadlines/2000_lines_of_log_.tar.gz_compressed-12 933 1261038 ns/op 60287 B/op 236 allocs/op
BenchmarkReadlines/100000_lines_of_log_.gz_compressed-12 19 58614711 ns/op 2536763 B/op 20279 allocs/op
PASS
ok github.com/grafana/loki/clients/pkg/promtail/targets/file 3.167s
@dannykopping Merging as I addressed all your great suggestions. I'll be working on the zip compression on a follow-up PR :smile: