orthofiller
orthofiller copied to clipboard
Program crashes but ghosh jobs keep going on
I recently ran an OrthoFiller process with 28 cores which crashed after some time for some input file problem. I had also time to track the timing of the program, which produces an output when the node considers the job as terminated (correctly or not). However, after 30-40 minutes, the ghost jobs were still visible with top (using 0% cpu and 0% of the RAM).
Hi Matteo,
What was the input problem, and were you able to fix it?
Thanks,
Michael
From: Matteo Schiavinato [mailto:[email protected]] Sent: 25 April 2017 15:59 To: mpdunne/orthofiller [email protected] Cc: Subscribed [email protected] Subject: [mpdunne/orthofiller] Program crashes but ghosh jobs keep going on (#5)
I recently ran an OrthoFiller process with 28 cores which crashed after some time for some input file problem. However, after 30-40 minutes, the ghost jobs were still visible with top (using 0% cpu and 0% of the RAM).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/mpdunne/orthofiller/issues/5, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGCT1-wf8Rl6wIgZAHrSoGIOton8BrKDks5rzgpFgaJpZM4NHnR8.
What was the input problem, and were you able to fix it?
Nothing script-related, I didn't index the FASTA files. I was more concerned with finding 28 ghost processes with top
after 30 minutes that it had crashed!
Okay thanks, I’ll look into that! If it’s something I can get the script to check for at the beginning, that would save time waiting for a failure: it would also mean I would have a better chance of exiting the program gracefully and dealing directly with those processes.
From: Matteo Schiavinato [mailto:[email protected]] Sent: 26 April 2017 13:27 To: mpdunne/orthofiller [email protected] Cc: Michael Dunne [email protected]; Comment [email protected] Subject: Re: [mpdunne/orthofiller] Program crashes but ghosh jobs keep going on (#5)
What was the input problem, and were you able to fix it? Nothing script-related, I didn't index the FASTA files. I was more concerned with finding 28 ghost processes with top after 30 minutes that it had crashed!
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/mpdunne/orthofiller/issues/5#issuecomment-297390045, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGCT13Jb6UM02rO5uaWw2nLTNqyLnLL7ks5rzzgUgaJpZM4NHnR8.
What was the input problem, and were you able to fix it?
I circumscribed the problem: I have a GTF file where cds sequences are not all multiple of 3, one where some coordinates are duplicated, and one where some of the coordinates are not found in the FASTA file. The multiple of 3 seems to be the last warning arising, so probably the reason why it gets stuck.
This time I didn't have any time
output so the process is still ongoing. However, none of the 20 cores is using cpu or ram, they seem idle!
Follow up on this topic:
If the tests on the consistency of the files are passed, the program continues normally and (if crashing) there will be no ghost processes later on. If the tests are not passed, then the threads stay up as ghosts without doing anything. Maybe the .join() function is not invoked yet?
Hi Matteo,
I've looked into the issue with Ghost processes and this should be fixed in the next update.
All the best,
Michael