crate icon indicating copy to clipboard operation
crate copied to clipboard

OOM Deleting old snapshots from and s3 repository

Open SStorm opened this issue 2 years ago • 2 comments

CrateDB version

4.8.1

CrateDB setup information

CrateDB Cloud CR0 instance - 2 vCPU, 2 GiB RAM, 4 GiB storage.

CRATE_HEAP_SIZE: 512m
CRATE_JAVA_OPTS="-Dcom.sun.management.jmxremote.port=6666 -Dcom.sun.management.jmxremote.ssl=false
            -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false
            -Dcom.sun.management.jmxremote.rmi.port=6666 -Djava.rmi.server.hostname=127.0.0.1
            -javaagent:/var/lib/crate/crate-jmx-exporter-1.0.0.jar=7071 -XX:+HeapDumpOnOutOfMemoryError
            -XX:HeapDumpPath=/resource/heapdump -Dlog4j2.formatMsgNoLookups=true"

Steps to Reproduce

This particular database does not have a lot of data - 27k records, about 600MiB storage used - but has a few very wide tables with 1000 columns.

Can provide snapshot with all the data to restore. Can provide heap dump.

Both too large to attach to ticket.

Expected Result

Creating a snapshot succeeds, but then DROP SNAPSHOT fails with an OOM, after hammering the GC for some time.

It should probably circuit-break and fail the DROP instead?

Actual Result

OOM

SStorm avatar Jul 28 '22 07:07 SStorm

@SStorm Sorry for the delay, this slipped through. Could you provide me the heap dump please? And also access to the snapshot if possible. Thank you!

seut avatar Aug 31 '22 14:08 seut

Snapshot and heap dump shared privately.

SStorm avatar Aug 31 '22 14:08 SStorm

After looking into the heap dump and provided logs, not only snapshot deletion caused OOM exceptions but also regular writes as this node was under very high memory pressure. It runs with 512MB HEAP only, which is below our recommendation of 1GB for any different use cases than very simple evaluation or minimal usage. Tuning our circuit breaker logic further to be more accurate in it's estimation could result in lot of effort, which we think isn't worth it for this rather unusual scenario. Instead we advise to increase the HEAP to at least 1GB.

seut avatar Oct 25 '22 10:10 seut