atlasdb icon indicating copy to clipboard operation
atlasdb copied to clipboard

[PDS-117310] KvTableMappingService.updateTableMap is spammed from TS threads when a table is deleted

Open jeremyk-91 opened this issue 4 years ago • 0 comments

Count: 12595
"com.palantir.logsafe.exceptions.SafeRuntimeException: I exist to show you the stack trace
	at <redacted> at com.palantir.atlasdb.keyvalue.impl.KvTableMappingService.lambda$updateTableMap$0(KvTableMappingService.java:95)
	at java.util.concurrent.atomic.AtomicReference.updateAndGet(AtomicReference.java:179)
	at com.palantir.atlasdb.keyvalue.impl.KvTableMappingService.updateTableMap(KvTableMappingService.java:95)
	at com.palantir.atlasdb.keyvalue.impl.KvTableMappingService.getMappedTableRef(KvTableMappingService.java:184)
	at com.palantir.atlasdb.keyvalue.impl.KvTableMappingService.getMappedTableName(KvTableMappingService.java:175)
	at com.palantir.atlasdb.keyvalue.impl.TableRemappingKeyValueService.deleteAllTimestamps(TableRemappingKeyValueService.java:133)
	at com.palantir.atlasdb.keyvalue.impl.TableSplittingKeyValueService.deleteAllTimestamps(TableSplittingKeyValueService.java:149)
	at 

Jeremy Kong added a comment - 06/May/20 4:41 PM Argh, apologies this slipped. So this seems to happen only in the targeted sweeper threads.

Looks like this might be a use case for a CoalescingSupplier, or some algorithm around it at least. There might still be a bunch of threads blocking, but what this would mean is that when a requesting thread makes a request, if there is no inflight request then it starts a request, otherwise it waits to join the next batch of requests. Normally this adds half the average latency to the call, but since this is inside targeted sweep there's no direct user impact doing so, and I think we probably win enough from not slamming the DB that waiting is better.

@gmaretic might be worth a sanity check that the above is reasonable?

Thanks Jeremy

jeremyk-91 avatar May 12 '20 20:05 jeremyk-91