turkle
turkle copied to clipboard
NotSupportedError at /batch/3/accept_next_task/ on Postgres
Hi,
I've been trying to deploy Turkle and use Postgres with it. However, I run through this issue where whenever I click on accept next task, it's says
NotSupportedError at /batch/1/accept_next_task/
FOR UPDATE cannot be applied to the nullable side of an outer join
Request Method: GET
Request URL: https://turkle.herokuapp.com/batch/1/accept_next_task/
Django Version: 2.2.24
Exception Type: NotSupportedError
Exception Value:
FOR UPDATE cannot be applied to the nullable side of an outer join
Exception Location: /app/.heroku/python/lib/python3.9/site-packages/django/db/backends/utils.py in _execute, line 84
Python Executable: /app/.heroku/python/bin/python
Python Version: 3.9.9
Python Path:
['/app/.heroku/python/bin',
'/app',
'/app/.heroku/python/lib/python39.zip',
'/app/.heroku/python/lib/python3.9',
'/app/.heroku/python/lib/python3.9/lib-dynload',
'/app/.heroku/python/lib/python3.9/site-packages']
Server time: Thu, 20 Jan 2022 10:33:18 +0000
Do you have any solution?
Thanks for the report. We've been using MySQL and hadn't run into this. Looks like there is an incompatibility between the query that Django is creating Postgres. Next step is to track this down to a particular line of Python/Django code and then see if there is anything in the Django documentation on this.
I'm rebuilding our Docker images and as a part of that will be working on a Postgres one. That will make it easier for us to test again Postgres and will hopefully have some insight into this.
I've been able to reproduce this using Postgres 14.1:
turkle/views.py, line 144, in accept_next_task
try:
with transaction.atomic():
batch = Batch.objects.get(id=batch_id)
# Lock access to all Tasks available to current user in the batch
# Force evaluation of the query set with len()
len(batch.available_task_ids_for(request.user).select_for_update())
This is being caused by some recent code we added to use transactions to prevent two annotators from working on the same task if they happen to click accept task at basically the same time.
Will be looking into this further.
Here is the query:
SELECT "turkle_task"."id" FROM "turkle_task" LEFT OUTER JOIN "turkle_taskassignment" ON ("turkle_task"."id" = "turkle_taskassignment"."task_id") WHERE ("turkle_task"."batch_id" = 1 AND NOT "turkle_task"."completed" AND "turkle_taskassignment"."id" IS NULL) FOR UPDATE; args=(1,)
This query is issued when the user clicks "accept next task" and is meant to lock all the tasks for a batch that aren't assigned while we figure out which one to assign to the user. Once that assignment is made, we release the lock.
Documentation here: https://docs.djangoproject.com/en/3.1/ref/models/querysets/#select-for-update
Ideas:
- Lock all tasks for that batch until the next task is assigned. This should be a really fast operation but happens when you have hundreds of people working on the same batch. Note that this would prevent anyone from submitting an assignment or getting one assigned to them until the first assignment is finished.
- Query for the tasks that are available and then use those task ids for the lock. I think this should work. I sketched some sequences on paper and I don't see a race condiition.
A shorter term solution is to do a check for postgres and skip the transaction/lock if so. There is a small possibility of a race condition that results in 2 assignments when you asked for one. We did experience that at least once on our production site so it is possible.
@cash What about catching the exception and logging a warning? More complicated but perhaps better, I wonder if postgres has a consistency/transaction mechanism that could be used instead to prevent the db op when that's the backend being used.
@TomLippincott How about locking all unfinished tasks for a batch? This will lock a few more additional tasks than the current code, but it should be fast lock. This would entail changing:
len(batch.available_task_ids_for(request.user).select_for_update())
to
len(batch.unfinished_tasks().select_for_update())
If this works for postgres, we could either put a conditional statement in there that tests for postgres or just switch to it for all dbs. Could you test this?
@cash Hacking that line into my installation fixes the issue. I suppose one could test with something like 'settings.DATABASES["default"]["type"].endswith("postgresql")' etc, but seems like that would create problems for any more-complex DB configuration. Maybe there's some way to determine which database a given model is served from?
looks like django.db.connection.vendor is the way to go. Django uses it for database specific stuff like this:
def process_lhs(self, compiler, connection):
lhs, lhs_params = super().process_lhs(compiler, connection)
if connection.vendor == "mysql":
return "LOWER(%s)" % lhs, lhs_params
return lhs, lhs_params