Bug: Race condition when datastore worker jobs processes large file in quick succession with active usage

Open duttonw opened this issue 2 years ago • 1 comments

Version:1.0.1 tweaked version 1.0.1-qgov.6 CKAN Version:CKAN 2.10.1

We have been seeing on latest Xloader system that recent changes seem to be causing deadlocks.

We have been able to resolve this issue by either killing the dead locked worker (or waiting for it to timeout) or via pgadmin or similar to kill the locking pid.

If anyone has an idea on how to make it so that we don't dead lock the database which then causes a major cpu spike since the 3 queries is now waiting on each other.

backend type	start	last update	pid	blocking pid	wait event	db	sql
client backend	2023-11-30 23:13:57 UTC	2023-11-30 23:13:57 UTC	16502	16493	Lock: relation	user_datastore	SELECT pg_indexes_size('775e9be7-eecd-47d6-b72f-18c3cebc137d')
client backend	2023-11-30 23:13:55 UTC	2023-11-30 23:13:57 UTC	16493	14752	Lock: relation	user_datastore	DROP TABLE "775e9be7-eecd-47d6-b72f-18c3cebc137d" CASCADE
client backend	2023-11-30 23:09:17 UTC	2023-11-30 23:13:56 UTC	4752		Client: ClientRead	user_datastore	SELECT count(_id) FROM "775e9be7-eecd-47d6-b72f-18c3cebc137d"

more information: https://stackoverflow.com/questions/32145189/avoid-exclusive-access-locks-on-referenced-tables-when-dropping-in-postgresql

Nov 30 '23 23:11 duttonw

@ThrawnCA this can be closed when https://github.com/ckan/ckanext-xloader/commit/e6687a280aafc3c2fdba367fa04c290df3ac04dd is include in ckan/ckanext-xloader.

Apr 05 '24 02:04 duttonw