grobid icon indicating copy to clipboard operation
grobid copied to clipboard

No retry mechanism on CrossrefRequest results in hanging.

Open AaronNGray opened this issue 3 years ago • 12 comments

There does not seem to be any timeout and retry mechanism in CrossrefRequest resulting in occasional hanging of overall Grobid REST requests.

AaronNGray avatar Feb 01 '21 00:02 AaronNGray

@AaronNGray The PR #725 introduces various improvements to the mechanism for calling Crossref, including adding the timeout. Feel free to have a look at it if you like.

lfoppiano avatar Mar 16 '21 10:03 lfoppiano

I am using v0.6.2 and sometimes CrossrefRequest will be blocked.

The grobid will hang and can not release resources.

image

image

image

image

elonzh avatar Jul 03 '21 16:07 elonzh

Normally it's fixed with #725, which is addressing this. (I don't know why the milestone has been changed to 0.6.2, but it's in 0.7.0 release - or master - as indicated initially)

kermitt2 avatar Jul 03 '21 17:07 kermitt2

(I don't know why the milestone has been changed to 0.6.2, but it's in 0.7.0 release - or master - as indicated initially)

double check and it's in 0.6.2 indeed, so the timeout should apply and release the thread, it's weird.

kermitt2 avatar Jul 03 '21 17:07 kermitt2

It seems a quite serious problem, consolidation with crossref almost always makes grobid hang and can't return response.

elonzh avatar Jul 08 '21 13:07 elonzh

mmm I just process 2000 pdf with crossref consolidation for header without issue, just a bit slower than biblio-glutton. Independently from the timeout which should apply and apparently it is not, do you indicate a mailto in config/grobid.yml for the polite use of CrossRef ?

kermitt2 avatar Jul 08 '21 14:07 kermitt2

mmm I just process 2000 pdf with crossref consolidation for header without issue, just a bit slower than biblio-glutton. Independently from the timeout which should apply and apparently it is not, do you indicate a mailto in config/grobid.yml for the polite use of CrossRef ?

mmmm, try to consolidate citations.

elonzh avatar Jul 08 '21 14:07 elonzh

I will try to reproduce it tomorrow.

elonzh avatar Jul 08 '21 14:07 elonzh

Consolidating citations now, with 8 threads, (crossref mailto address in the config file), it becomes slow after a while, Crossref service lowers the availability, but still going on.

PDF processing   9% │█▌              │  189/1943 (0:09:02 / 1:23:50) 
PDF processing  21% │███▍            │  418/1943 (0:31:13 / 1:53:53) 
PDF processing  25% │████            │  487/1943 (0:40:53 / 2:02:13) 
PDF processing  25% │████            │  495/1943 (0:50:18 / 2:27:08) 
PDF processing  32% │█████▎          │  639/1943 (1:09:07 / 2:21:02) 
PDF processing  37% │██████          │  729/1943 (1:21:59 / 2:16:31) 
PDF processing  48% │███████▋        │  937/1943 (1:54:03 / 2:02:26) 
...

For 1942 articles, we have 103498 calls to CrossRef in principle.

It's not clear that GROBID is hanging, maybe it's simply the timeout at 60s for very low number of allowed requests... so we could have several ten minutes per document just waiting for the timeout of each reference.

Anyway, for consolidating references, I think if we want to scale with safety and predictable performance, the only solution is to have a local biblio-glutton service instead of using crossref REST API.

kermitt2 avatar Jul 08 '21 18:07 kermitt2

Is it possible to have a cumulative retry count ?

I did look at running a local crossref, they do a torrent download of the data. But there is no import utility, and the main source code is written in Clojure (a Scheme variant) so I had to put it on the backburner for now.

AaronNGray avatar Jul 08 '21 18:07 AaronNGray

Is it possible to have a cumulative retry count ?

There is no retry :) I see CrossRef REST API only OK for occasional usage. When it starts to fail, there's already too much stress put by Grobid on this web API (even with metadata plus subscription) and it means it is time to use biblio-glutton.

I did look at running a local crossref, they do a torrent download of the data. But there is no import utility, and the main source code is written in Clojure (a Scheme variant) so I had to put it on the backburner for now.

We develop https://github.com/kermitt2/biblio-glutton exactly for this. It provides better results (enriched crossref entries, slightly more accurate matching), it scales horizontally and basically solves the problems with Crossref API for a production usage.

I looked initially at cayenne (https://github.com/CrossRef/cayenne), but at the time it was not usable outside the internal crossref environment and I didn't manage to understand the code (it's not just Clojure - I developed in List and Haskwell in the past, but it was not documented or not commented at all). With Luca we find easier and faster to develop our own API based on CrossRef data dump :)

kermitt2 avatar Jul 08 '21 19:07 kermitt2

(I don't know why the milestone has been changed to 0.6.2, but it's in 0.7.0 release - or master - as indicated initially)

double check and it's in 0.6.2 indeed, so the timeout should apply and release the thread, it's weird.

Hi, I am also impacted by this (randomly and seemingly infrequently, when Crossref consolidation is on - mailto is set politely - the container CPU would jump to 100% and stop processing any other requests) and need to use Grobid for production use.

Is 0.7.0 expected to bring any stability improvement over 0.6.2?

If not, I'd definitely consider switching to biblio-glutton to avoid issues in prod (or disable consolidation entirely).

(P.S. this is related to https://github.com/kermitt2/grobid/issues/755 also, just that I started experiencing not only failed requests, but also container stability issues with 0.6.2)

davidefiocco avatar Sep 16 '21 10:09 davidefiocco