orthofiller icon indicating copy to clipboard operation
orthofiller copied to clipboard

Program crashes but ghosh jobs keep going on

Open MatteoSchiavinato opened this issue 7 years ago • 6 comments

I recently ran an OrthoFiller process with 28 cores which crashed after some time for some input file problem. I had also time to track the timing of the program, which produces an output when the node considers the job as terminated (correctly or not). However, after 30-40 minutes, the ghost jobs were still visible with top (using 0% cpu and 0% of the RAM).

MatteoSchiavinato avatar Apr 25 '17 14:04 MatteoSchiavinato

Hi Matteo,

What was the input problem, and were you able to fix it?

Thanks,

Michael

From: Matteo Schiavinato [mailto:[email protected]] Sent: 25 April 2017 15:59 To: mpdunne/orthofiller [email protected] Cc: Subscribed [email protected] Subject: [mpdunne/orthofiller] Program crashes but ghosh jobs keep going on (#5)

I recently ran an OrthoFiller process with 28 cores which crashed after some time for some input file problem. However, after 30-40 minutes, the ghost jobs were still visible with top (using 0% cpu and 0% of the RAM).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/mpdunne/orthofiller/issues/5, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGCT1-wf8Rl6wIgZAHrSoGIOton8BrKDks5rzgpFgaJpZM4NHnR8.

mpdunne avatar Apr 25 '17 15:04 mpdunne

What was the input problem, and were you able to fix it?

Nothing script-related, I didn't index the FASTA files. I was more concerned with finding 28 ghost processes with top after 30 minutes that it had crashed!

MatteoSchiavinato avatar Apr 26 '17 12:04 MatteoSchiavinato

Okay thanks, I’ll look into that! If it’s something I can get the script to check for at the beginning, that would save time waiting for a failure: it would also mean I would have a better chance of exiting the program gracefully and dealing directly with those processes.

From: Matteo Schiavinato [mailto:[email protected]] Sent: 26 April 2017 13:27 To: mpdunne/orthofiller [email protected] Cc: Michael Dunne [email protected]; Comment [email protected] Subject: Re: [mpdunne/orthofiller] Program crashes but ghosh jobs keep going on (#5)

What was the input problem, and were you able to fix it? Nothing script-related, I didn't index the FASTA files. I was more concerned with finding 28 ghost processes with top after 30 minutes that it had crashed!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mpdunne/orthofiller/issues/5#issuecomment-297390045, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGCT13Jb6UM02rO5uaWw2nLTNqyLnLL7ks5rzzgUgaJpZM4NHnR8.

mpdunne avatar Apr 26 '17 12:04 mpdunne

What was the input problem, and were you able to fix it?

I circumscribed the problem: I have a GTF file where cds sequences are not all multiple of 3, one where some coordinates are duplicated, and one where some of the coordinates are not found in the FASTA file. The multiple of 3 seems to be the last warning arising, so probably the reason why it gets stuck.

This time I didn't have any time output so the process is still ongoing. However, none of the 20 cores is using cpu or ram, they seem idle!

MatteoSchiavinato avatar Apr 26 '17 12:04 MatteoSchiavinato

Follow up on this topic:

If the tests on the consistency of the files are passed, the program continues normally and (if crashing) there will be no ghost processes later on. If the tests are not passed, then the threads stay up as ghosts without doing anything. Maybe the .join() function is not invoked yet?

MatteoSchiavinato avatar May 04 '17 08:05 MatteoSchiavinato

Hi Matteo,

I've looked into the issue with Ghost processes and this should be fixed in the next update.

All the best,

Michael

mpdunne avatar May 30 '17 13:05 mpdunne