IntelOwl icon indicating copy to clipboard operation
IntelOwl copied to clipboard

Thread Safety Issue in django-treebeard's `get_root()` method

Open srijan2607 opened this issue 3 weeks ago • 1 comments

What happened

The get_root() method in api_app/models.py (Job model) has a known thread-safety issue with django-treebeard. When multiple concurrent requests access the job tree structure, a MultipleObjectsReturned exception can occur. The current workaround catches this exception and uses .first() as a fallback, which can lead to:

  1. Inconsistent results - Different requests may get different root jobs depending on database ordering
  2. Data integrity issues - If multiple roots exist due to race conditions, operations may occur on the wrong job tree
  3. Silent failures - The error is caught without proper logging or alerting

The code explicitly acknowledges this issue with the comment: # django treebeard is not thread safe

def get_root(self):
    if self.is_root():
        return self
    try:
        return super().get_root()
    except self.MultipleObjectsReturned:
        # django treebeard is not thread safe
        # this is not a really valid solution, but it will work for now
        return self.objects.filter(path=self.path[0 : self.steplen]).first()

Environment

  1. OS: Any (platform-independent issue)
  2. IntelOwl version: All versions using django-treebeard (current develop branch)

What did you expect to happen

The get_root() method should always return the correct and consistent root job node, even under concurrent access. Tree operations should be thread-safe and maintain data integrity.

How to reproduce your issue

  1. Set up IntelOwl with multiple Celery workers
  2. Create a parent job with multiple child jobs (pivot/investigation scenario)
  3. Trigger concurrent requests that access the job tree (e.g., multiple API calls to retrieve job status)
  4. Under high concurrency, the MultipleObjectsReturned exception may occur

Alternatively: Write a stress test that creates concurrent child jobs and calls get_root() simultaneously from multiple threads.

Error messages and logs

The exception is currently caught silently, but the underlying error would be:

django.core.exceptions.MultipleObjectsReturned: get() returned more than one Job -- it returned X!

Suggested Solutions

Option 1: Use Database Locking (Recommended)

from django.db import transaction

def get_root(self):
    if self.is_root():
        return self
    with transaction.atomic():
        try:
            return super().get_root()
        except self.MultipleObjectsReturned:
            logger.error(
                f"Multiple root nodes found for Job {self.pk}. "
                "This indicates a data consistency issue."
            )
            return Job.objects.select_for_update().filter(
                path=self.path[0:self.steplen]
            ).order_by('pk').first()

Option 2: Add Locks to Tree Modification Operations

Wrap add_child() and add_root() in atomic transactions with select_for_update().

srijan2607 avatar Dec 03 '25 14:12 srijan2607

@fgibertoni what do you think ?

srijan2607 avatar Dec 03 '25 14:12 srijan2607