parallel is True would use all process.
I discovered that when we set parallel=True, the PDFHandler class utilizes all available processes.
However, some projects may not require or benefit from using all processes simultaneously. To address this, I attempted to modify the PDFHandler's parse & io.py def function method to accept an additional input parameter for the desired number of processes (cpu_count). By default, cpu_count can be set to None, allowing users to explicitly choose whether to use all available processes or limit the number of processes for specific needs. I am concerned that I might have overlooked other areas where additional restrictions or triggers might be necessary, which is why I am writing this issue.
def read_pdf( filepath: Union[StrByteType, Path], pages="1", password=None, flavor="lattice", suppress_stdout=False, parallel=False, cpu_count=None, layout_kwargs=None, debug=False, **kwargs, ):
tables = p.parse( flavor=flavor, suppress_stdout=suppress_stdout, parallel=parallel, cpu_count=cpu_count, layout_kwargs=layout_kwargs, **kwargs, )tables = p.parse( flavor=flavor, suppress_stdout=suppress_stdout, parallel=parallel, cpu_count=cpu_count, layout_kwargs=layout_kwargs, **kwargs, )
with TemporaryDirectory() as tempdir: if cpu_count is None: cpu_count = mp.cpu_count() elif mp.cpu_count() == 1: cpu_count = 1