BUG FIX: daemonic processes are not allowed to have children
Error: PDF file used in Marker: crowd.pdf from benchmark dataset
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 216, in thread_wrapper
res = future.result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/user/app/app.py", line 20, in use_marker
result = markdown_extractor.extract(content, config)
File "/home/user/app/marker/markdown_extractor.py", line 31, in extract
full_text, images, out_meta = convert_single_pdf(inputtmpfile.name, self.model_lst, max_pages=params.max_pages, langs=params.langs, batch_multiplier=params.batch_multiplier)
File "/usr/local/lib/python3.10/site-packages/marker/convert.py", line 86, in convert_single_pdf
surya_detection(doc, pages, detection_model, batch_multiplier=batch_multiplier)
File "/usr/local/lib/python3.10/site-packages/marker/ocr/detection.py", line 24, in surya_detection
predictions = batch_text_detection(images, det_model, processor, batch_size=int(get_batch_size() * batch_multiplier))
File "/usr/local/lib/python3.10/site-packages/surya/detection.py", line 135, in batch_text_detection
results = list(executor.map(parallel_get_lines, preds, orig_sizes))
File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 766, in map
results = super().map(partial(_process_chunk, fn),
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 610, in map
fs = [self.submit(fn, *args) for args in zip(*iterables)]
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 610, in <listcomp>
fs = [self.submit(fn, *args) for args in zip(*iterables)]
File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 738, in submit
self._start_executor_manager_thread()
File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 678, in _start_executor_manager_thread
self._launch_processes()
File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 705, in _launch_processes
self._spawn_process()
File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 714, in _spawn_process
p.start()
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 118, in start
assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children
Reason: In Python, a daemon process cannot have child processes. The ProcessPoolExecutor is likely being used in a context where its parent process is a daemon.
Fix: To fix this, ensure that the parent process is not a daemon or refactor the code to use ThreadPoolExecutor instead if parallelism is needed and I/O-bound tasks dominate the workload. For CPU-bound tasks, you might need to avoid creating processes within daemon threads.
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅
I have read the CLA Document and I hereby sign the CLA
Hi @rishiraj, We've switched to using ThreadPoolExecutor's here: https://github.com/VikParuchuri/surya/pull/235