invenio icon indicating copy to clipboard operation
invenio copied to clipboard

bst_create_icons does not finish with certain documents

Open aw-bib opened this issue 10 years ago • 7 comments

We tried to create a bunch of icons for our FullTexts collection, as they were suppressed upon original ingestion. Thus we called

$ inv $ib/bibtasklet -N createicons -T bst_create_icons -a recid=False -a collection=FullTexts -a icon_sizes=180,640,1440 -u admin

(Note: calling syntax for handling a whole collection is not clear cf. issue #2192. The above worked with a slightly modified version of the tasklet, just ignoring the recid altogether and going for the collection right away.)

This started the process, but it seems that for certain documents the icon creation fails. Unfortunately, the call to the externals does not return, thus the bibtasklet is hanging in the bibsched-queue and even worse hindering other tasks to proceed, as the tasklet is About going to sleep forever. There is no message indicating a hanging job in the tasklets logs.

An example of a failing document can be found here: http://bib-pubdb1.desy.de/record/139620

aw-bib avatar Sep 02 '14 06:09 aw-bib

To be protected against never-ending external processes we have invenio.shellutils.run_process_with_timeout

def run_process_with_timeout(args, filename_in=None, filename_out=None, filename_err=None, cwd=None, timeout=CFG_MISCUTIL_DEFAULT_PROCESS_TIMEOUT, sudo=None):

@ludmilamarian have you had something similar on CDS?

kaplun avatar Sep 02 '14 11:09 kaplun

Given what is mentioned in #2192 this is related to Invenio 1.1.3.

kaplun avatar Sep 03 '14 20:09 kaplun

Tracking it further down, it seems pdftk is not finishing it's job.

Actually, we found, that pdftk can not handle the file in question and fails. Sometimes, it returns, sometimes, it just hangs. In case of the latter, it is actually behaving pretty badly, eating up 100% cpu at this point, so you might end up with quite a load if bst_create_icons runs against a larger collection. (Luckily, however it hangs at the point in question so if you clean up by hand and get rid of all the zombies afterwards... ;)

Martin found some mentions on the web (unfortunately he didn't give me a pointer) that there is/was/persists to be some issue with signal handling if pdftk is called via system()/exec()/fork() or friends from python, php or the like. Probably, this is of help. Probably requirement of pdftk ends up at >x.yy?

Our pdftk is v1.44 from SL5.10.

aw-bib avatar Sep 04 '14 06:09 aw-bib

@egabancho @ludmilamarian Any updates on this one?

tiborsimko avatar Nov 14 '16 10:11 tiborsimko

unfortunately we did not experience this issue, thus we can't really provide a solution. What I can say is that we did large amounts of conversions, and everything was ok for us. We are currently using pdftk v2.02. I assume things are better now, the last message on this thread was in 2014.

ludmilamarian avatar Nov 14 '16 11:11 ludmilamarian

I assume things are better now, the last message on this thread was in 2014.

...assuming, that pdftk and it's toolchain was updated in the meantime this might be the case, yes. Note that on the quite common SL 6.x is still v1.44.

Note also, that if you have a PDF from a broad range of publishers/processing tools it may well happen that some parts of the tool chain can not handle it properly. (As usual, you're quite lucky in HEP here as it is, again as usual, quite homogeneous.)

Anyway, I'm not sure if it's possible to detect such a hanging tool and kill a job in these cases. Say by some timeout detection.

aw-bib avatar Nov 14 '16 12:11 aw-bib

Such a functionality is there in invenio: https://github.com/inveniosoftware/invenio/blob/maint-1.1/modules/miscutil/lib/shellutils.py#L158

but is not used to create icons: https://github.com/inveniosoftware/invenio/blob/maint-1.1/modules/websubmit/lib/websubmit_icon_creator.py#L311

Is there anyone willing to contribute a fix?

kaplun avatar Nov 14 '16 12:11 kaplun