MISO icon indicating copy to clipboard operation
MISO copied to clipboard

Issue parsing queue job id

Open sepidehparhami opened this issue 8 years ago • 1 comments

I ran into an issue using the cluster to run miso, where the job id number is not properly parsed from the queue output. The error occurs at line 233 of ./misopy/cluster_utils.py (job_id = int(output[0].split(".")[0])). This is the message:

ValueError: invalid literal for int() with base 10: 'JSV: No h_data is set; setting default h_data=1G (if this value is too small, the job will fail)\nYour job 260032 ("gene_psi_batch_0_time_07-27-17_14-28-44'

So the parsing statement does not work correctly if any warnings are thrown during queue submission. I noticed that there is an optional flag --no-wait that circumvents the issue, so anyone else encountering this issue should use that for the time being.

sepidehparhami avatar Jul 28 '17 17:07 sepidehparhami

@sepidehparhami we need better error handling here for sure. The bigger issue is that there isn't a general way to parse the output of each cluster submission system to tell if the job finished. Each system's output is a bit different. We'd need to switch to a framework that handles parallel and distributed jobs to do this better. As you said, that's why we have the no wait flag, so that you can submit jobs and then it's left up to the user to decide whether they want to wait and to implement that wait step in a system-specific way.

yarden avatar Jul 28 '17 18:07 yarden