Rcorrector Encountering 'Failed at Stage 3' Error

Hello.

I am trying to reproduce the results from this study here and I am getting the following error with no additional information:

/mnt/xxx/rcorrector/jellyfish/bin/jellyfish bc -m 25 -s 100000000 -C -t 72 -o tmp_e700548b30e0b29fceb613ecd64460d4.bc <(gzip -cd ../raw-data/pbmc_fastqs/scRNA_PBMBCs_B2run2.fastq.gz) 
Count the kmers in the bloom filter
/mnt/xxx/rcorrector/jellyfish/bin/jellyfish count -m 25 -s 100000 -C -t 72 --bc tmp_e700548b30e0b29fceb613ecd64460d4.bc -o tmp_e700548b30e0b29fceb613ecd64460d4.mer_counts <(gzip -cd ../raw-data/pbmc_fastqs/scRNA_PBMBCs_B2run2.fastq.gz) 
Dump the kmers
/mnt/xxx/rcorrector/jellyfish/bin/jellyfish dump -L 2 tmp_e700548b30e0b29fceb613ecd64460d4.mer_counts > tmp_e700548b30e0b29fceb613ecd64460d4.jf_dump
Error correction
/mnt/xxx/rcorrector/rcorrector -k 25 -od ../raw-data/pbmc_fastqs/Rcorrected -t 72  -r ../raw-data/pbmc_fastqs/scRNA_PBMBCs_B2run2.fastq.gz -c tmp_e700548b30e0b29fceb613ecd64460d4.jf_dump
Failed at stage 3.

No idea why I am getting this error. My code and input files are nearly identical to the link above. Is there a way I could further troubleshoot this? Another thing I noticed is that I get a lot farther in the program when I freshly install a new clone of Rcorrector (this was from a previous attempt):

Stored 42636637 kmers
Weak kmer threshold rate: 0.109401 (estimated from 0.950/1 of the chosen kmers)
Bad quality threshold is '@'


^CFailed at stage 3.

I interrupted the process because I was worried it was going to take a long time and I wanted submit a batch script.

Do you have any ideas? Should I try installing a new copy of Rcorrector with Jellyfish each time I want to use this software? I got this warning after running make:

...
make[1]: Entering directory '/mnt/xxx/software/rcorrector/jellyfish'
make[1]: Warning: File 'unit_tests/gtest/src/.deps/libgtest_main_la-gtest_main.Plo' has modification time 7.9 s in the future
make  all-am
make[2]: Entering directory '/mnt/xxx/software/rcorrector/jellyfish'
make[2]: Warning: File 'unit_tests/gtest/src/.deps/libgtest_main_la-gtest_main.Plo' has modification time 7.8 s in the future
...
make[2]: warning:  Clock skew detected.  Your build may be incomplete.
make[2]: Leaving directory '/mnt/xxx/software/rcorrector/jellyfish'
make[1]: warning:  Clock skew detected.  Your build may be incomplete.
make[1]: Leaving directory '/mnt/xxx/software/rcorrector/jellyfish'

Let me know what you think or if you have any additional questions.

Apr 14 '22 17:04 aseyedia

I'm also receiving the "Failed at stage 3" error. I can run the sample data just fine, and if I run just one pair of fastq files at a time Rcorrector finishes fine. I'm wondering if it is a memory issue?

Is there any difference between running an entire set of samples at once versus running them one sample at a time? If Rcorrector will return the same results in both of these cases, I would just go ahead and run the samples individually rather than troubleshooting why running them all at once results in this error.

Jul 27 '22 20:07 dthom7

@aseyedia Sorry for the late reply. It seems I missed your issue somehow. There were several bugs fixed in the most recent version, so you probably got the results with the new version installed. For the jellyfish, if it can still be executed, then you can ignore those warnings from the makefile.

@dthom7 What is your command when running all the samples? Running together and individually are different. Since the transcript expression patterns are different from sample to sample and the sequencing depth is also different, I would recommend running it sample by sample separately.

Jul 28 '22 01:07 mourisl

@mourisl Thanks for your response. I was running it with the command perl ~/programs/Rcorrector-1.0.5/run_rcorrector.pl -1 ${R1_input} -2 ${R2_input} where R1_input and R2_input are comma separated lists of 34 fastq.qz files.

It runs fine when I run it separately on each sample, and if that is how you'd recommend doing it anyway then I can just proceed that way.

Jul 29 '22 20:07 dthom7

@dthom7 The command looks fine to me. I guess it could be a memory issue. It takes a large amount of memory to count the kmers from 34 samples. It is better to run them separately for both computational and biological reasons.

Jul 29 '22 20:07 mourisl