sarek terminated for an unknown reason -- Likely it has been terminated by the external system

Description of the bug

i don not know why there are such errors

Command used and terminal output

Caused by:
  Process `NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_HAPLOTYPECALLER:GATK4_HAPLOTYPECALLER (AK58_C_1)` terminated for an unknown reason -- Likely it has been terminated by the external system

Command executed:

  gatk --java-options "-Xmx163840M -XX:-UsePerfData" \
      HaplotypeCaller \
      --input AK58_C_1.md.cram \
      --output AK58_C_1.haplotypecaller.chr2A_part1_1-384157900.g.vcf.gz \
      --reference wheat_AK58v4MP.genome_part.fa \
       \
      --intervals chr2A_part1_1-384157900.bed \
       \
       \
      --tmp-dir . \
      -ERC GVCF
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_HAPLOTYPECALLER:GATK4_HAPLOTYPECALLER":
      gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

Work dir:
  /public/home/fanrong/work_lei/01_htt/01_240927_well_bse_bsr/01_bsr/02_sarek/02_vcf/work/bd/a8da3f055c7df258c4271504de32d7

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

No response

Oct 07 '24 09:10 fan040

this happens when the scheduler kills your job, typically because you either run out of memory or time. To investigate, can you take a look at the .command.log file of the failing task? You can find it in the work directory: /public/home/fanrong/work_lei/01_htt/01_240927_well_bse_bsr/01_bsr/02_sarek/02_vcf/work/bd/a8da3f055c7df258c4271504de32d7

Oct 07 '24 10:10 FriederikeHanssen

no .command.log in file,just have .command.run and .command.sh.

Oct 08 '24 01:10 fan040

@fan040 , I've had this recently, with only the .command.run and .command.sh existing. It means that your scheduler tried to schedule a job but something happened to make it fail before it ever started running. I'd suggest you just retry. If it happens a lot, talk to your cluster sysadmins.

I've been seeing this on a virtual slurm cluster created by AWS Parallel Cluster. In that case, I think it is when the worker node not being started properly (maybe because there were no nodes of that type available).

One complication is that this means there is no exit code returned to Nextflow, so that the standard retry strategy, which is checking for a certain subset of codes and automatically retrying those, doesn't work. I ended up just setting to always retry (errorStrategy = 'retry').

Nov 08 '24 15:11 SPPearce

thank you for the response:)

| | 饷晴 | | @.*** |

---- Replied Message ---- | From | Simon @.> | | Date | 11/08/2024 23:44 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [nf-core/sarek] terminated for an unknown reason -- Likely it has been terminated by the external system (Issue #1676) |

@fan040 , I've had this recently, with only the .command.run and .command.sh existing. It means that your scheduler tried to schedule a job but something happened to make it fail before it ever started running. I'd suggest you just retry. If it happens a lot, talk to your cluster sysadmins.

I've been seeing this on a virtual slurm cluster created by AWS Parallel Cluster. In that case, I think it is when the worker node not being started properly (maybe because there were no nodes of that type available).

One complication is that this means there is no exit code returned to Nextflow, so that the standard retry strategy, which is checking for a certain subset of codes and automatically retrying those, doesn't work. I ended up just setting to always retry (errorStrategy = 'retry').

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Nov 08 '24 15:11 fan040