celery-batches icon indicating copy to clipboard operation
celery-batches copied to clipboard

Memory leak

Open ekorian opened this issue 6 months ago • 2 comments

Memory Leak in celery-batches 0.10.0 Main Process

Description: Severe memory leaks (up to 40GB) occur in the main Celery process when using celery-batches 0.10.0. The issue is resolved by downgrading to celery-batches 0.9.0.

Environment:

  • Python: 3.10.12
  • Celery: 5.3.6
  • celery-batches: 0.10.0 (issue) / 0.9.0 (working)
  • OS: Ubuntu 22.04 (jammy)

Task Configuration:

@celery.task(
    queue='write_tasks',
    base=Batches,
    flush_every=500,
    flush_interval=10,
    time_limit=60
)
def writer_task(requests):
    # Task implementation

Worker Configuration:

celery -A app.make_celery.celery worker \
    --concurrency=10 \
    -Q write_tasks \
    --prefetch-multiplier=500 \
    --max-tasks-per-child=500 \
    --max-memory-per-child=400000

Issue:

  1. The main Celery process (not the worker processes) continuously accumulates memory
  2. Memory usage grows to ~40GB over time
  3. No memory is released until worker restart
  4. Worker processes stay within their memory limits
  5. Issue persists even with:
    • Different prefetch multiplier values
    • Different batch sizes
    • Memory limits per child
    • Task result ignored
    • Gossip/mingle/heartbeat disabled

Technical Analysis: The memory leak appears to be related to how the Batches class handles request objects in its buffer. Looking at the source code:

  1. The Batches class maintains two queues:

    self._buffer: Queue[Request] = Queue()
    self._pending: Queue[Request] = Queue()
    
  2. In version 0.10.0, there appears to be an issue with the request object lifecycle in the _do_flush method:

    def _do_flush(self) -> None:
        ready_requests = []
        all_requests = list(consume_queue(self._buffer)) + list(consume_queue(self._pending))
        # ... processing ...
    
  3. The memory leak likely occurs because:

    • Request objects in the queues may not be properly garbage collected
    • The SimpleRequest objects created during serialization may be holding references
    • The _pending queue might not be properly cleared in some error cases

Steps to Reproduce:

  1. Install celery-batches 0.10.0
  2. Configure a batch task as shown above
  3. Send a continuous stream of tasks to the queue
  4. Monitor main process memory usage (not worker processes)
  5. Observe memory growth in the main Celery process

Workaround: Downgrading to celery-batches 0.9.0 completely resolves the memory leak.

Additional Notes:

  • The leak appears to be in the main Celery process, not in the worker processes
  • Memory usage grows regardless of actual batch size or workload
  • No errors or warnings are logged
  • Memory is not released until process restart
  • The issue seems to be related to request object lifecycle management in the batching mechanism

Would you like me to provide any additional information or logs?

ekorian avatar Jun 12 '25 20:06 ekorian

Do you have a minimal example? Do you know what object is leaking?

clokep avatar Jun 12 '25 21:06 clokep

Also was this issue written by AI? It's very hard to read.

clokep avatar Jun 12 '25 22:06 clokep