RATTLE icon indicating copy to clipboard operation
RATTLE copied to clipboard

rattle correction step giving error

Open dvirdi01 opened this issue 1 year ago • 9 comments

I ran rattle correct on my input files through snakemake. I get an error message saying this:

Error in rule cluster_correction: jobid: 13 input: data/.../.../samplefile.fastq output: data/RATTLE_out/samplefile/corrected.fq, data/RATTLE_out/samplefile/uncorrected.fq, data/RATTLE_out/samplefile/consensi.fq log: log/RATTLE_log/samplefile_correct.out, log/RATTLE_log/samplefile_correct.err (check log file(s) for error details) shell: /storage/.../.../bin/RATTLE/rattle correct -i data/.../.../samplefile.fastq -c data/RATTLE_out/samplefile/clusters.out -o data/RATTLE_out/samplefile/corrected.fq data/RATTLE_out/samplefile/uncorrected.fq data/RATTLE_out/samplefile/consensi.fq -t 48 > log/RATTLE_log/samplefile.out 2> log/RATTLE_log/samplefile.err
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Error executing rule cluster_correction on cluster (jobid: 13, external: 2761217, jobscript: /storage/.../.../.../.snakemake/tmp.tz0fhacf/snakejob.cluster_correction.13.sh). For error details see the cluster log and the log files of the involved rule(s).

When I open samplefile.err it says: "Reading fasta file... Done" and when I open samplefile.out it is empty.

I also get this message below:

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message slurmstepd: error: Detected 1 oom-kill event(s) in StepId=2761217.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler. srun: error: valiant1: task 0: Out Of Memory slurmstepd: error: Detected 1 oom-kill event(s) in StepId=2761217.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

I gave it 100GB ram to begin with but I guess it wasn't enough. Is there a way to know how much ram I need to give it before I run the snakemake command?

dvirdi01 avatar Oct 10 '23 17:10 dvirdi01

Hi,

You can have a look at the memory usage figure in our paper.

Otherwise, I need more information like the number of reads or the fastq file size to give you a RAM estimation.

Your 'samplefile.out' file should not be empty, because it is a binary file. You need to look at the file size to check whether it is empty.

Eileen

eileen-xue avatar Oct 11 '23 01:10 eileen-xue

Hi, I checked the output files for some of the processes that did run. IIt created consensi.fq, uncorrected,fq and corrected.fq hbut they are all 0 bytes. I am not sure why this is happening. This was my snakemake command:

rule cluster_correction: input: "data/.../.../{sample}.fastq" output: touch("data/.../{sample}/corrected.fq"), touch("data/.../{sample}/uncorrected.fq"), touch("data/.../{sample}/consensi.fq") params: clusters = "data/.../{sample}/clusters.out" log: out = "log/.../{sample}_correct.out", err = "log/.../{sample}_correct.err" threads: 48 resources: mem = 100 shell: """/storage/.../.../.../.../rattle correct
-i {input}
-c {params.clusters}
-o {output}
-t {threads}
> {log.out}
2> {log.err} """

To add on: the same happened with my rattle cluster_summary step- it created a tsv file but it was also 0 bytes.

dvirdi01 avatar Oct 11 '23 15:10 dvirdi01

Hi,

This problem seems not from the error correction step but from the clustering step.

Please provide answers to the following questions to help us identify the issues and provide solutions.

  1. Is your clustering step output (clusters.out) file size 0 bytes?
  2. What is your clustering step command? And what is the log for your clustering step?
  3. Do you meet the out-of-memory issue with your clustering step? Normally, clustering uses more memory than error correction.

Eileen

eileen-xue avatar Oct 12 '23 00:10 eileen-xue

  1. none of my clusters.out files are 0 bytes so I think cluster and cluster extraction steps were working
  2. this was my rule for clustering step:

input: "data/.../..../{samle}.fastq.gz" output: touch("data/.../{sample}.done") params:
outdir = "data/..../{sample}" log: out = "log/.../{sample}.out", err = "log/.../{sample}.err" threads: 48 resources: mem = 200 shell: """mkdir -p {params.outdir}; /storage/.../.../.../.../rattle cluster
--input {input}
--output {params.outdir}
--threads {threads}
--verbose
> {log.out}
2> {log.err}"""

In my log, my sample.out file says "Reads: ...some number..." and my sample.err says: [================================================================================] 67715/67715 (100%)85%) Iteration 0.3 complete [================================================================================] 24054/24054 (100%)58%) Iteration 0.25 complete [================================================================================] 11360/11360 (100%)12%) Iteration 0.2 complete [================================================================================] 7204/7204 (100%)61%) Iteration 0 complete Gene clustering done 5507 gene clusters found

  1. I think I did for some of the files. For those I re-ran it by allocating more memory.

dvirdi01 avatar Oct 12 '23 15:10 dvirdi01

Hi,

Your RATTLE error correction step command is incorrect. To specify the outputs, you don't need to list all the output files' names and locations. Only need an output folder location, like -o [out_dir]

Hope this helps. Eileen

eileen-xue avatar Oct 13 '23 03:10 eileen-xue

  1. Hi, isn't that what I did though? I gave the output file location as params.outdir?

Edit: Oh I think I get what you were saying I had this earlier for my error correction step in my smk file:

output:
touch("data/.../{sample}/corrected.fq"),
touch("data/.../{sample}/uncorrected.fq"),
touch("data/.../{sample}/consensi.fq")

but I should change it to-

output:
touch("data/.../{sample}")

Is this ^ what you meant? Also In my snakefile I had:

rule all:
    input:
       expand("data/..../{sample}/{filename}.fq",  
       sample = config['samples'], filename = ["corrected", "uncorrected", "consensi"])

Would I need to change the expand command in my snakefile?


  1. Also, how about my cluster_summary.tsv file being empty? Was it due to the same error? I did not run the cluster extraction and cluster summary step from snakemake but I ran it directly from command line for all my files. This is what I had:
./rattle extract_clusters -i /storage/.../.../.../.../.../.../sample.fastq  -c /storage/.../.../.../.../.../sample/clusters.out -o /storage/.../..../.../.../.../sample/clusters --fastq

./rattle cluster_summary -i /storage/.../.../.../.../.../.../sample.fastq -c /storage/.../.../.../.../.../sample/clusters.out > /storage/.../.../.../.../.../sample/cluster_summary.tsv

Why did this command produce an empty tsv file?

dvirdi01 avatar Oct 13 '23 15:10 dvirdi01

  1. Your new output command is correct. If you want to use multiple fastq files as input, the format should be -i input_1.fq,input_2.fq,...,input_n.fq. All files must be separated by comma, no space or line break is allowed. Don't use Snakenmake expand for RATTLE input, expand will create new lines. Also, I don't understand why using corrected.fq, uncorrected.fq, consensi.fq as input. This will make your input and output the exact same file.

  2. Your command looks correct. Possible issues: Inputs of the cluster step and cluster_summary step are not the same. Your input.fastq file or clusters.out file location is incorrect.

eileen-xue avatar Oct 16 '23 01:10 eileen-xue

Hi, thanks for the reply. I didn’t understand why I would need to skip the extract clusters step. Wouldn’t that step be necessary to do the next step which is cluster correction

On Sun, Oct 15, 2023 at 7:39 PM Eileen Xue @.***> wrote:

Your new output command is correct. If you want to use multiple fastq files as input, the format should be -i input_1.fq,input_2.fq,...,input_n.fq. All files must be separated by comma, no space or line break is allowed. Don't use Snakenmake expand for RATTLE input, expand will create new lines. Also, I don't understand why using corrected.fq, uncorrected.fq, consensi.fq as input. This will make your input and output the exact same file. 2.

Your command looks correct. You can skip extract_clusters Possible issues: Inputs of the cluster step and cluster_summary step are not the same. Your input.fastq file and clusters.out file location is incorrect.

— Reply to this email directly, view it on GitHub https://github.com/comprna/RATTLE/issues/50#issuecomment-1763596965, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXXIJCHCL5JD6Z6IIAXESX3X7SF4PAVCNFSM6AAAAAA52V7L2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRTGU4TMOJWGU . You are receiving this because you authored the thread.Message ID: @.***>

dvirdi01 avatar Oct 16 '23 02:10 dvirdi01

extract_clusters and cluster_summary are designed to make cluster step results readable. Only the cluster step is necessary step before the correction step.

eileen-xue avatar Oct 16 '23 02:10 eileen-xue