Thread Safety Issue in django-treebeard's `get_root()` method
What happened
The get_root() method in api_app/models.py (Job model) has a known thread-safety issue with django-treebeard. When multiple concurrent requests access the job tree structure, a MultipleObjectsReturned exception can occur. The current workaround catches this exception and uses .first() as a fallback, which can lead to:
- Inconsistent results - Different requests may get different root jobs depending on database ordering
- Data integrity issues - If multiple roots exist due to race conditions, operations may occur on the wrong job tree
- Silent failures - The error is caught without proper logging or alerting
The code explicitly acknowledges this issue with the comment: # django treebeard is not thread safe
def get_root(self):
if self.is_root():
return self
try:
return super().get_root()
except self.MultipleObjectsReturned:
# django treebeard is not thread safe
# this is not a really valid solution, but it will work for now
return self.objects.filter(path=self.path[0 : self.steplen]).first()
Environment
- OS: Any (platform-independent issue)
- IntelOwl version: All versions using django-treebeard (current develop branch)
What did you expect to happen
The get_root() method should always return the correct and consistent root job node, even under concurrent access. Tree operations should be thread-safe and maintain data integrity.
How to reproduce your issue
- Set up IntelOwl with multiple Celery workers
- Create a parent job with multiple child jobs (pivot/investigation scenario)
- Trigger concurrent requests that access the job tree (e.g., multiple API calls to retrieve job status)
- Under high concurrency, the
MultipleObjectsReturnedexception may occur
Alternatively: Write a stress test that creates concurrent child jobs and calls get_root() simultaneously from multiple threads.
Error messages and logs
The exception is currently caught silently, but the underlying error would be:
django.core.exceptions.MultipleObjectsReturned: get() returned more than one Job -- it returned X!
Suggested Solutions
Option 1: Use Database Locking (Recommended)
from django.db import transaction
def get_root(self):
if self.is_root():
return self
with transaction.atomic():
try:
return super().get_root()
except self.MultipleObjectsReturned:
logger.error(
f"Multiple root nodes found for Job {self.pk}. "
"This indicates a data consistency issue."
)
return Job.objects.select_for_update().filter(
path=self.path[0:self.steplen]
).order_by('pk').first()
Option 2: Add Locks to Tree Modification Operations
Wrap add_child() and add_root() in atomic transactions with select_for_update().
@fgibertoni what do you think ?