go-carbon icon indicating copy to clipboard operation
go-carbon copied to clipboard

[BUG] Memory usage steady growing over time

Open Thorsieger opened this issue 8 months ago • 11 comments

Describe the bug I am experiencing a slow but steady memory leak which forces a service restart every week or so.

Logs Memory usage over time on the physical server : image

pprof (on one instance) :

  • ~ 20 minutes after startup :
Showing nodes accounting for 1917.94MB, 99.01% of 1937.09MB total
Dropped 34 nodes (cum <= 9.69MB)
Showing top 10 nodes out of 45
      flat  flat%   sum%        cum   cum%
 1025.26MB 52.93% 52.93%  1025.26MB 52.93%  github.com/dgryski/go-trigram.NewIndex
  618.16MB 31.91% 84.84%   618.16MB 31.91%  strings.(*Builder).grow (inline)
  101.44MB  5.24% 90.08%   101.44MB  5.24%  github.com/go-graphite/go-carbon/cache.(*Cache).Add
  101.05MB  5.22% 95.29%   101.05MB  5.22%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).updateFileList.func3
      22MB  1.14% 96.43%   138.94MB  7.17%  github.com/go-graphite/go-carbon/receiver/tcp.(*TCP).HandleConnection
   17.23MB  0.89% 97.32%    17.23MB  0.89%  github.com/go-graphite/go-carbon/cache.(*Cache).makeQueue
      15MB  0.77% 98.09%       15MB  0.77%  github.com/go-graphite/go-carbon/points.OnePoint (inline)
   14.16MB  0.73% 98.82%    14.16MB  0.73%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
       2MB   0.1% 98.93%    37.12MB  1.92%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).fetchWithCache.func1
    1.62MB 0.084% 99.01%    35.12MB  1.81%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).prepareDataProto
  • ~ 15 hours after startup :
      flat  flat%   sum%        cum   cum%
 2481.68MB 21.47% 21.47%  2481.68MB 21.47%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
 1952.18MB 16.89% 38.36%  1952.18MB 16.89%  github.com/go-graphite/protocol/carbonapi_v3_pb.(*FetchRequest).UnmarshalVT
 1882.28MB 16.28% 54.64%  1882.28MB 16.28%  strings.(*Builder).grow
 1313.07MB 11.36% 66.00%  3921.19MB 33.92%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
 1205.57MB 10.43% 76.43%  1205.57MB 10.43%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
  931.52MB  8.06% 84.49%   931.52MB  8.06%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
  561.05MB  4.85% 89.35%   561.05MB  4.85%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).fullPath
  471.03MB  4.08% 93.42%   471.03MB  4.08%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).newDir (inline)
  184.90MB  1.60% 95.02%   184.90MB  1.60%  github.com/go-graphite/go-carbon/cache.(*Cache).Add
     161MB  1.39% 96.41%   767.56MB  6.64%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).expandGlobsTrie
  • Today (~ 90 hours after startup) :
Showing nodes accounting for 42091.68MB, 94.62% of 44485.60MB total
Dropped 171 nodes (cum <= 222.43MB)
Showing top 10 nodes out of 44
      flat  flat%   sum%        cum   cum%
12231.80MB 27.50% 27.50% 12231.80MB 27.50%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).getExpandedGlobs
 9761.41MB 21.94% 49.44%  9761.41MB 21.94%  github.com/go-graphite/protocol/carbonapi_v3_pb.(*FetchRequest).UnmarshalVT
 9495.70MB 21.35% 70.78%  9495.70MB 21.35%  strings.(*Builder).grow
 2663.25MB  5.99% 76.77%  2663.25MB  5.99%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).fullPath
 2278.12MB  5.12% 81.89%  2278.12MB  5.12%  github.com/go-graphite/go-carbon/carbonserver.newFileNode (inline)
 2016.61MB  4.53% 86.43%  6359.81MB 14.30%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).insert
 1363.03MB  3.06% 89.49%  1363.03MB  3.06%  github.com/go-graphite/go-carbon/carbonserver.(*trieNode).addChild (inline)
  830.51MB  1.87% 91.36%  4875.24MB 10.96%  github.com/go-graphite/go-carbon/carbonserver.(*CarbonserverListener).expandGlobsTrie
  749.19MB  1.68% 93.04%   751.19MB  1.69%  github.com/go-graphite/go-carbon/carbonserver.newGlobState
  702.04MB  1.58% 94.62%   702.04MB  1.58%  github.com/go-graphite/go-carbon/carbonserver.(*trieIndex).newDir (inline)

Go-carbon Configuration:

[common]
user = "carbon"
graph-prefix = "carbon.agents.{host}"
metric-endpoint = "tcp://10.254.0.36:2003"
metric-interval = "1m0s"
max-cpu = 6

[whisper]
data-dir = "/var/lib/graphite/whisper"
schemas-file = "/etc/go-carbon/storage-schemas.conf"
aggregation-file = "/etc/go-carbon/storage-aggregation.conf"
workers = 8
max-updates-per-second = 10000
max-creates-per-second = 500
hard-max-creates-per-second = false
sparse-create = false
flock = true
enabled = true
hash-filenames = true
compressed = false
remove-empty-file = false

[cache]
max-size = 50000000
write-strategy = "noop"

[udp]
listen = ":2003"
enabled = false
buffer-size = 0

[tcp]
listen = ":2003"
enabled = true
buffer-size = 0

[pickle]
listen = ":2004"
max-message-size = 67108864
enabled = false
buffer-size = 0

[carbonlink]
listen = "127.0.0.1:7002"
enabled = true
read-timeout = "30s"

[grpc]
listen = "127.0.0.1:7003"
enabled = false

[tags]
enabled = false
tagdb-url = "http://127.0.0.1:8000"
tagdb-chunk-size = 32
tagdb-update-interval = 100
local-dir = "/var/lib/graphite/tagging/"
tagdb-timeout = "1s"

[carbonserver]
listen = "0.0.0.0:8080"
enabled = true
buckets = 10
metrics-as-counters = false
read-timeout = "60s"
write-timeout = "60s"
query-cache-enabled = false
query-cache-size-mb = 40960
find-cache-enabled = true
trigram-index = false
scan-frequency = "5m0s"
trie-index = true
file-list-cache = ""
concurrent-index = false
realtime-index = 0
cache-scan = false
max-globs = 600
fail-on-max-globs = false
max-metrics-globbed  = 30000
max-metrics-rendered = 1000
empty-result-ok = false
internal-stats-dir = ""
stats-percentiles = [99, 98, 95, 75, 50]

[dump]
enabled = false
path = "/var/lib/graphite/dump/"
restore-per-second = 0

[pprof]
listen = "localhost:7007"
enabled = true

[[logging]]
logger = ""
file = "stdout"
level = "info"
encoding = "json"
encoding-time = "iso8601"
encoding-duration = "seconds"

Metric retention and aggregation schemas N/A

Simplified query (if applicable) N/A

Additional context I have a graphite infrastructure that handle 2.4M metrics/minutes. The storage part is composed of 4 go-carbon instances behind a carbon-c-relay. This 4 storages nodes are on a single physical server : 32 cpu/512GB ram/NVME storage.

go-carbon version : ghcr.io/go-graphite/go-carbon:0.17.3

After checking existing issues, I tried both trie and/or trigram for indexes with no effect. I enabled pprof, the output is above.

may be related to #579

Thorsieger avatar Jul 01 '24 07:07 Thorsieger