rabbitmq-server
rabbitmq-server copied to clipboard
khepri: lots of memory garbage after node startup
metadata_store process memory usage after a node restart is significantly higher than necessary.
The memory can be freed with force_gc. I've observed this behavior with different definition files,
using 10-queues-and-1M-topic-bindings.json just as an example (it's large and therefore the difference is significant):
bazel run broker
rabbitmqctl enable_feature_flag raft_based_metadata_store_phase1
rabbitmqctl import_definitions ../sample-configs/topic-bindings/10-queues-and-1M-topic-bindings.json
rabbitmq-diagnostics memory_breakdown | grep metadata_store
metadata_store: 2.5856 gb (93.35%)
rabbitmqctl stop_app
rabbitmqctl start_app
rabbitmq-diagnostics memory_breakdown | grep metadata_store
metadata_store: 5.3593 gb (55.74%)
rabbitmqctl force_gc
rabbitmq-diagnostics memory_breakdown | grep metadata_store
metadata_store: 2.7336 gb (31.75%)
As you can see, after force_gc it's pretty much back to where it was after the import (why it's not the same or lower than before is potentially interesting as well but the key part is almost twice the memory used before after boot, but before force_gc).
Running forced GC after definition may be a decent idea for the non-Khepri branches, too.
The snapshoting process is responsible for this large memory usage. While snapshoting is necessary during normal operation, taking several snapshots while importing definitions is not necessary.
The snapshot interval indicates how often snapshots should be taken, and could be increased to avoid snapshoting during the import. However, this needs to be a temporary change only (as discussed with @kjnilsson )
In order to achieve this, some changes to khepri are necessary to introduce a new command to configure the snapshot_interval on runtime: https://github.com/rabbitmq/khepri/pull/136
The changes on the server to increase/restore the snapshot_interval from rabbit are in this commit: https://github.com/rabbitmq/rabbitmq-server/commit/4ddbfc619f8d6cb2aedc8e1ac6873d5d47d6e356
I've re-tested this scenario with the most recent changes and it seems to be better. Not great, but better. ;)
$ rabbitmqctl enable_feature_flag raft_based_metadata_store_phase1
Enabling feature flag "raft_based_metadata_store_phase1" ...
$ time rabbitmqctl import_definitions ~/sample-configs/topic-bindings/10-queues-and-1M-topic-bindings.json
Importing definitions in JSON from a file at "/home/mkuratczyk/sample-configs/topic-bindings/10-queues-and-1M-topic-bindings.json" ...
Successfully started definition import. This process is asynchronous and can take some time.
________________________________________________________
Executed in 354.74 secs fish external
usr time 38.01 secs 0.00 micros 38.01 secs
sys time 6.00 secs 685.00 micros 6.00 secs
$ rabbitmq-diagnostics memory_breakdown
Reporting memory breakdown on node rabbit@kura...
metadata_store: 1.8983 gb (91.93%)
code: 0.0449 gb (2.17%)
allocated_unused: 0.0367 gb (1.78%)
other_system: 0.0304 gb (1.47%)
other_ets: 0.0263 gb (1.27%)
other_proc: 0.0178 gb (0.86%)
...
$ rabbitmq-diagnostics memory_breakdown | grep metadata_store
metadata_store: 1.8983 gb (91.93%)
$ rabbitmqctl stop_app
$ rabbitmqctl start_app
$ rabbitmq-diagnostics memory_breakdown | grep metadata_store
metadata_store: 3.2803 gb (68.51%)
$ rabbitmqctl force_gc
$ rabbitmq-diagnostics memory_breakdown | grep metadata_store
metadata_store: 2.7336 gb (96.08%)
I'm still surprised that after a restart and force_gc, it needs 1GB more than just after an import, but I haven't looked into this.
Retested a year later. It's significantly better, but to some extent it's still there. Perhaps we should call GC after startup?
After import:
metadata_store_ets: 1.1841 gb (64.7%)
metadata_store: 0.4415 gb (24.12%)
allocated_unused: 0.0662 gb (3.61%)
After stop/start:
reserved_unallocated: 4.3943 gb (66.48%)
metadata_store_ets: 1.1841 gb (17.91%)
metadata_store: 0.9155 gb (13.85%)
(after some time, reserved_unallocated drops to 100MB or so)
After force_gc:
metadata_store_ets: 1.1841 gb (63.62%)
metadata_store: 0.5298 gb (28.46%)
So it seems like we had 400MB of garbage in the metadata_store.
We now run GC at the end of node boot on the khepri branch. Perhaps we should do it in ore places (e.g. after importing definitions via CLI tools or the HTTP API) but this issue (as I understood it) is specifically about node boot.
I changed the patch slightly so that the garbage collection is triggered from the rabbit application start function, not outside. This is so that rabbit stays as close as possible to a regular Erlang application. I also added the commit message as a comment, referencing this issue, because the commit itself will be squashed.