salmon icon indicating copy to clipboard operation
salmon copied to clipboard

Salmon segfaulting on some

Open jdrnevich opened this issue 3 years ago • 1 comments

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)? bulk mode

Describe the bug We have 60 samples made from very low RNA inputs (cell captures) so the libraries were made with Ovation Solo RNAseq kit from Tecan. 150 PE. One of the sample finished fine (but mapping rate 21%) but our nextflow pipeline crashed on the second one and on another one I tested with a segfault but no other information to why that we can see:

salmon quant -i ../data/references/salmon-1.4.0-ncbi-GRCm39_AND_egfp_Annot109 -l ISF \
    -1 trimmomatic/AAV_204M_TCCTGGTA_L001_R1_001.fastq.qualtrim.paired.fastq \
    -2 trimmomatic/AAV_204M_TCCTGGTA_L001_R2_001.fastq.qualtrim.paired.fastq \
    --numBootstraps=30 \
    --validateMappings --recoverOrphans \
    -o salmon/AAV_204M_TCCTGGTA_L001 \
    --seqBias --gcBias --writeUnmappedNames -p 8

#it ran for a while and then did:
processed 3,000,000 fragmentsintLog] [info] First decoy index : 129,698
hits: 760,262, hits per frag:  0.254757Segmentation fault

I tried running just the R1 fastq file and it finished fine without a segfault. Mapping rates were ~15%

To Reproduce Specific to particular fastq files

Specifically, please provide at least the following information:

  • Which version of salmon was used? 1.4.0

  • How was salmon installed (compiled, downloaded executable, through bioconda)? compiled using CMake with gcc version 8.2.0 (not by me); easybuild config file is at https://github.com/IGBIllinois/easybuild/blob/master/easyconfigs/s/Salmon/Salmon-1.4.0-IGB-gcc-8.2.0.eb

  • Which reference (e.g. transcriptome) was used? Custom reference of NCBI GRCm39 + egfp protein, although same segfault occurs when using plain GRCm39 that has worked for many other SE and PE projects

  • Which read files were used? Owned by PI; I may or may not be able to send a pair to you

  • Which which program options were used? See above example

Expected behavior Finishing without segfault like the first sample did. I can send you the salmon_quant.log or any other file that would be useful

Screenshots If applicable, add screenshots or terminal output to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. Ubuntu Linux, OSX] OS CentOS 7.8
  • Version [ If you are on OSX, the output of sw_vers. If you are on linux the output of uname -a and lsb_release -a]

Additional context Add any other context about the problem here.

jdrnevich avatar Jun 09 '21 22:06 jdrnevich

UPDATE: On @rob-p 's suggestion, I removed the --recoverOrphans option and then all 60 samples did finished without segfaulting. Perhaps there were too many orphans to handle - alignments rates were a dismal 0.5-23%. These were heavily degraded samples that the sequencing center recommended not to sequence but the PI wanted to try it anyway. If you want a pair of fastq files (full or cutdown to ~5 M reads) to test this weird edge case, I can see about getting them to you. Thanks!

jdrnevich avatar Jun 16 '21 16:06 jdrnevich