grobid_client_python Error

Hello

I trust you are all well. I've been encountering an error for the past few days while attempting to process full text from a batch using the Python client. Despite my efforts, the error persists. My system specifications include an Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz with 8GB RAM. I've tried adjusting parameters such as concurrency in the grobid.yaml file, but unfortunately, this hasn't resolved the issue. I'm reaching out to see if there are any additional steps I can take to address this problem. Thank you for your assistance.

ERROR [2024-04-13 20:31:33,322] org.grobid.service.process.GrobidRestProcessFiles: Could not get an engine from the pool within configured time. Sending service unavailable.

Apr 13 '24 20:04 NeoH2333

Hi @NeoH2333, the default config of the client config.json file uses a batch_size of 100 which is too big. This number should be consistent with the number in the grobid.yaml.

If this does not solve the problem, could you share more information, including both config.json and grobid.yaml files?

Apr 14 '24 08:04 lfoppiano

Hello !

@NeoH2333 8GB is not enough for applying processFulltextDocument on more than one PDF at the same time in a safe manner, especially if you are using Deep Learning models on CPU only. Consider using 16GB if possible. Otherwise, set the --n argument of the client side to 1.

@lfoppiano batch_size is only for managing the acquisition of files by the ThreadPoolExecutor, it is not related to the server load or concurrency in grobid.yaml, it can stay at 100 or 1000 without any impact on the server (it will use just a bit more memory at client side to store the list of paths to the pdf).

Apr 14 '24 12:04 kermitt2

Ahh, sorry indeed, the batch_size does not impact the number of concurrent requests... 🙏 @NeoH2333 ignore my comment please.

Apr 15 '24 07:04 lfoppiano

grobid_client_python grobid_client_python copied to clipboard

Error

grobid_client_python
grobid_client_python copied to clipboard