opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard
[filelogreceiver] CPU consumption increases (roughly) linearly with number of files watched
Component(s)
pkg/stanza, receiver/filelog
What happened?
Description
The CPU consumption of the filelog receiver increases (roughly) linearly as the number of files to be watched increases.
Steps to Reproduce
- Download otelcol-contrib (any recent release. tested on Linux AMD64 with different versions from 0.77 to 0.86)
- Create 100 small files with random content of about 1000 bytes in size. e.g.
i=1; while [ $i -ne 100 ]; do dd if=/dev/urandom bs=1000 count=1 | base64 > /tmp/logs/$i.log; i=$(($i+1)); done - Run OTel Collector with a simple pipeline of
filelogreceiver->loggingexporterand send the 100 files through the pipeline - Observe CPU utilization of the otelcol-contrib process. On my test system (AMD Zen3 laptop CPU, 100 files uses ~5% of a single core to watch 100 files)
- Restart the process at step 2 and increase the number of files to be watched.
- Observe the CPU utilization increase. On my test system, at 1000 files watched, ~110% of one core is needed to watch the 1000 files
Expected Result
Good question.
I think that ultimately, the expected result is that the default configuration of the filelog receiver protects the administrator from putting unexpected burden on the Host CPU.
Actual Result
With a default configuration, the filelog receiver can cause significant increase in CPU utilization.
Collector version
v0.86.0
Environment information
Environment
OS: Archlinux x86_64, Compiler(if manually compiled): N/A. Using binaries from github for testing.
OpenTelemetry Collector configuration
receivers:
filelog/manyfiles:
include:
- /tmp/logs/*
exporters:
logging:
# verbosity: detailed ## if visual verification of pipeline is required
service:
pipelines:
logs/manyfiles:
receivers:
- filelog/manyfiles
processors: []
exporters:
- logging
Log output
./otelcol-contrib --config minimal.yaml
2023-10-03T18:51:22.983+1100 info [email protected]/telemetry.go:84 Setting up own telemetry...
2023-10-03T18:51:22.983+1100 info [email protected]/telemetry.go:201 Serving Prometheus metrics {"address": ":8888", "level": "Basic"}
2023-10-03T18:51:22.983+1100 info [email protected]/exporter.go:275 Deprecated component. Will be removed in future releases. {"kind": "exporter", "data_type": "logs", "name": "logging"}
2023-10-03T18:51:22.985+1100 info [email protected]/service.go:138 Starting otelcol-contrib... {"Version": "0.86.0", "NumCPU": 16}
2023-10-03T18:51:22.985+1100 info extensions/extensions.go:31 Starting extensions...
2023-10-03T18:51:22.985+1100 info adapter/receiver.go:45 Starting stanza receiver {"kind": "receiver", "name": "filelog/manyfiles", "data_type": "logs"}
2023-10-03T18:51:22.988+1100 info [email protected]/service.go:161 Everything is ready. Begin running and processing data.
2023-10-03T18:51:23.195+1100 info fileconsumer/file.go:196 Started watching file from end. To read preexisting logs, configure the argument 'start_at' to 'beginning' {"kind": "receiver", "name": "filelog/manyfiles", "data_type": "logs", "component": "fileconsumer", "path": "/tmp/logs/1.log"}
2023-10-03T18:51:23.258+1100 info fileconsumer/file.go:194 Started watching file {"kind": "receiver", "name": "filelog/manyfiles", "data_type": "logs", "component": "fileconsumer", "path": "/tmp/logs/560.log"}
2023-10-03T18:51:23.307+1100 info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 100}
2023-10-03T18:51:23.307+1100 info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 100}
Additional context
A quick look at pprof shows that most CPU time is mostly spent on GC.
(pprof) top10
Showing nodes accounting for 6420ms, 58.95% of 10890ms total
Dropped 280 nodes (cum <= 54.45ms)
Showing top 10 nodes out of 173
flat flat% sum% cum cum%
1700ms 15.61% 15.61% 1700ms 15.61% runtime/internal/syscall.Syscall6
1220ms 11.20% 26.81% 1220ms 11.20% runtime.memclrNoHeapPointers
960ms 8.82% 35.63% 2360ms 21.67% runtime.scanobject
950ms 8.72% 44.35% 950ms 8.72% memeqbody
360ms 3.31% 47.66% 540ms 4.96% runtime.findObject
300ms 2.75% 50.41% 2780ms 25.53% runtime.gcDrain
300ms 2.75% 53.17% 490ms 4.50% runtime.greyobject
220ms 2.02% 55.19% 220ms 2.02% runtime.procyield
210ms 1.93% 57.12% 210ms 1.93% runtime.futex
200ms 1.84% 58.95% 200ms 1.84% encoding/base64.(*Encoding).Encode
Pinging code owners:
- filelogreceiver: @djaglowski
- pkg/stanza: @djaglowski
- receiver/filelog: @djaglowski
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Performance improvements are always welcome.
Pinging code owners for receiver/filelog: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself.
Pinging code owners for pkg/stanza: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself.
Performance improvements are always welcome.
I'll see how far I can get with just some flamegraph analysis.
Thanks @cwegener. Feel free to share any insights you discover here.
Thanks @cwegener. Feel free to share any insights you discover here.
My first instinct is that allocations are escaping to heap. I'll have a look which allocations that might be and if the code can be changed so that the compiler will use stack-based allocation instead.
Looking at this again after some related discussions, I think I can explain what is going here.
Basically, we are remembering N file (roughly 3x what we find in a poll interval). Then, for each of the M files we find in a poll interval, we open it, read a fingerprint from it, and directly compare it against the N files we remember (unless we find a match, in which case we stop short).
There is clearly opportunity for improvement here and I think recent refactoring has put us closer to a trie-based solution which would give us much more efficient cross-referencing of files. Let's keep this open until we can make a substantial gain on this front.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
- receiver/filelog: @djaglowski
- pkg/stanza: @djaglowski
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
- receiver/filelog: @djaglowski
- pkg/stanza: @djaglowski
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Hello. Just a small comment/feedback. In our environments this bug is a blocker, so we had to deprecate failog receiver and switch to another (non-opentelemetry related) solution for a specific case.
Thank you for the additional feedback @afoninsky. We will take a serious look at this before moving the component to GA.
The most impactful mitigation I've found is to lengthen the poll_interval to 1s.
With a value of 200ms my understanding is the system checks all files 5 times per second. With 1000+ files, this creates significant overhead.
With poll_interval: 200ms
(pprof) top10
Showing nodes accounting for 17500ms, 47.44% of 36890ms total
Dropped 530 nodes (cum <= 184.45ms)
Showing top 10 nodes out of 250
flat flat% sum% cum cum%
4250ms 11.52% 11.52% 4250ms 11.52% runtime/internal/syscall.Syscall6
3030ms 8.21% 19.73% 3030ms 8.21% memeqbody
2330ms 6.32% 26.05% 3020ms 8.19% encoding/json.appendCompact
2010ms 5.45% 31.50% 2010ms 5.45% runtime.memclrNoHeapPointers
1400ms 3.80% 35.29% 4620ms 12.52% runtime.scanobject
1150ms 3.12% 38.41% 4260ms 11.55% github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/fileset.(*Fileset[go.shape.*uint8]).Match
960ms 2.60% 41.01% 1270ms 3.44% runtime.typePointers.next
890ms 2.41% 43.43% 1920ms 5.20% runtime.pcvalue
860ms 2.33% 45.76% 860ms 2.33% runtime.futex
620ms 1.68% 47.44% 620ms 1.68% encoding/json.stateInString
With poll_interval: 1s
(pprof) top10
Showing nodes accounting for 3650ms, 46.09% of 7920ms total
Dropped 238 nodes (cum <= 39.60ms)
Showing top 10 nodes out of 269
flat flat% sum% cum cum%
830ms 10.48% 10.48% 830ms 10.48% runtime/internal/syscall.Syscall6
790ms 9.97% 20.45% 790ms 9.97% memeqbody
460ms 5.81% 26.26% 580ms 7.32% encoding/json.appendCompact
310ms 3.91% 30.18% 930ms 11.74% runtime.scanobject
280ms 3.54% 33.71% 280ms 3.54% runtime.memclrNoHeapPointers
260ms 3.28% 36.99% 1050ms 13.26% github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/fileset.(*Fileset[go.shape.*uint8]).Match
230ms 2.90% 39.90% 450ms 5.68% runtime.pcvalue
170ms 2.15% 42.05% 930ms 11.74% github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/fingerprint.Fingerprint.StartsWith (inline)
160ms 2.02% 44.07% 160ms 2.02% runtime.futex
160ms 2.02% 46.09% 160ms 2.02% runtime.step
The next mitigation will be to reduce the max_concurrent_files because each file is read concurrently.
Hope this could help mitigate readers here.