grobid icon indicating copy to clipboard operation
grobid copied to clipboard

Improve JEP initialisation and DL model loading error logging

Open lfoppiano opened this issue 1 year ago • 3 comments

This PR tries to show an error at startup when JEP is not correctly initialised:

For example:

INFO  [2022-08-09 14:23:57,617] org.grobid.core.jni.DeLFTModel: Loading DeLFT model for quantities with architecture BidLSTM_CRF...
running thread: 1
INFO  [2022-08-09 14:23:57,623] org.grobid.core.jni.JEPThreadPool: Creating JEP instance for thread 16
ERROR [2022-08-09 14:23:57,633] org.grobid.core.jni.JEPThreadPool: JEP initialisation failed somewhere.

The startup will continue, however at least there is a trace in the log file.

lfoppiano avatar Aug 09 '22 14:08 lfoppiano

Thanks !

Here it's not an error of the JEP initialization itself (like linking the native JEP library).

It's an error of the loading the deep learning model, which might be due to different reasons not necessarily related to JEP (model does not exist, wrong model version, etc.). Then the difficulty is to pass the python error trace into java so that it is readable.

The error reporting behavior right now is not very satisfactory I agree. Each model has its own JEP instance in its own thread. This JEP instance loads the model. In case of success, we propagate the Keras architecture display from python into the Java logs (but it can still fail later for other reasons :D). In case of failure of the loading of the model, I think it's often a kind of silent failure, and then when grobid tries to use it, we have a model access error (because the initial loading fails).

I am not so sure how to identify the cause of error without diving again in the code, we might need to look at the python error trace with regex :/

Something that can be done easily before calling JEP: check that the deep learning files exist in grobid home.

Note for us: if something is changed in DeLFTModel.java (the sequence labelling) it has to be changed in DeLFTClassifierModel.java too for consistency.

kermitt2 avatar Aug 09 '22 17:08 kermitt2

Thanks for the feedback.

If I understood correctly, the root of this issues comes when running on separate threads the exceptions are lost in a limbo. The proper way would be to have a sort of messaging between threads.

I think if we can display the stacktrace of the exception we will have precious hints on where the error comes from (e.g. the JepException usually show additional python code if the error comes from there).

I've added some logs before throwing, so that something will appear in the log.

Note for us: if something is changed in DeLFTModel.java (the sequence labelling) it has to be changed in DeLFTClassifierModel.java too for consistency.

I've made the same changes in the DeLFTClassifierModel.java too.

lfoppiano avatar Aug 10 '22 08:08 lfoppiano

Coverage Status

Coverage decreased (-0.008%) to 39.876% when pulling 2a47ebf51f60b6f7a56beed263e0748e18b97c33 on improvement/jep-init-error-display into b0939e062d4541897dda9c514dc5e5ad5f8c0461 on master.

coveralls avatar Aug 10 '22 08:08 coveralls