grobid_client_python
grobid_client_python copied to clipboard
Error
Hello
I trust you are all well. I've been encountering an error for the past few days while attempting to process full text from a batch using the Python client. Despite my efforts, the error persists. My system specifications include an Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz with 8GB RAM. I've tried adjusting parameters such as concurrency in the grobid.yaml file, but unfortunately, this hasn't resolved the issue. I'm reaching out to see if there are any additional steps I can take to address this problem. Thank you for your assistance.
ERROR [2024-04-13 20:31:33,322] org.grobid.service.process.GrobidRestProcessFiles: Could not get an engine from the pool within configured time. Sending service unavailable.
Hi @NeoH2333,
the default config of the client config.json
file uses a batch_size
of 100
which is too big. This number should be consistent with the number in the grobid.yaml
.
If this does not solve the problem, could you share more information, including both config.json
and grobid.yaml
files?
Hello !
@NeoH2333 8GB is not enough for applying processFulltextDocument
on more than one PDF at the same time in a safe manner, especially if you are using Deep Learning models on CPU only. Consider using 16GB if possible. Otherwise, set the --n
argument of the client side to 1.
@lfoppiano batch_size
is only for managing the acquisition of files by the ThreadPoolExecutor, it is not related to the server load or concurrency in grobid.yaml
, it can stay at 100 or 1000 without any impact on the server (it will use just a bit more memory at client side to store the list of paths to the pdf).
Ahh, sorry indeed, the batch_size does not impact the number of concurrent requests... 🙏 @NeoH2333 ignore my comment please.