surya icon indicating copy to clipboard operation
surya copied to clipboard

BUG FIX: daemonic processes are not allowed to have children

Open rishiraj opened this issue 1 year ago • 2 comments

Error: PDF file used in Marker: crowd.pdf from benchmark dataset

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 216, in thread_wrapper
    res = future.result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/app/app.py", line 20, in use_marker
    result = markdown_extractor.extract(content, config)
  File "/home/user/app/marker/markdown_extractor.py", line 31, in extract
    full_text, images, out_meta = convert_single_pdf(inputtmpfile.name, self.model_lst, max_pages=params.max_pages, langs=params.langs, batch_multiplier=params.batch_multiplier)
  File "/usr/local/lib/python3.10/site-packages/marker/convert.py", line 86, in convert_single_pdf
    surya_detection(doc, pages, detection_model, batch_multiplier=batch_multiplier)
  File "/usr/local/lib/python3.10/site-packages/marker/ocr/detection.py", line 24, in surya_detection
    predictions = batch_text_detection(images, det_model, processor, batch_size=int(get_batch_size() * batch_multiplier))
  File "/usr/local/lib/python3.10/site-packages/surya/detection.py", line 135, in batch_text_detection
    results = list(executor.map(parallel_get_lines, preds, orig_sizes))
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 766, in map
    results = super().map(partial(_process_chunk, fn),
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 610, in map
    fs = [self.submit(fn, *args) for args in zip(*iterables)]
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 610, in <listcomp>
    fs = [self.submit(fn, *args) for args in zip(*iterables)]
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 738, in submit
    self._start_executor_manager_thread()
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 678, in _start_executor_manager_thread
    self._launch_processes()
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 705, in _launch_processes
    self._spawn_process()
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 714, in _spawn_process
    p.start()
  File "/usr/local/lib/python3.10/multiprocessing/process.py", line 118, in start
    assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children

Reason: In Python, a daemon process cannot have child processes. The ProcessPoolExecutor is likely being used in a context where its parent process is a daemon.

Fix: To fix this, ensure that the parent process is not a daemon or refactor the code to use ThreadPoolExecutor instead if parallelism is needed and I/O-bound tasks dominate the workload. For CPU-bound tasks, you might need to avoid creating processes within daemon threads.

rishiraj avatar Jun 15 '24 19:06 rishiraj

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

github-actions[bot] avatar Jun 15 '24 19:06 github-actions[bot]

I have read the CLA Document and I hereby sign the CLA

rishiraj avatar Jun 15 '24 19:06 rishiraj

Hi @rishiraj, We've switched to using ThreadPoolExecutor's here: https://github.com/VikParuchuri/surya/pull/235

iammosespaulr avatar Oct 31 '24 10:10 iammosespaulr