salmon
salmon copied to clipboard
Consistent salmon quant segfault
Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
salmon
Describe the bug
Running salmon quant through my SLURM cluster consistently segfaults. I've attempted runs on m4.2xlarge & m4.8xlarge worker nodes.
Aug 16 19:38:23 ip-172-31-30-93 kernel: [ 681.083866] salmon[4167]: segfault at 2641a ip 00007fe2fcdc2dca sp 00007fff27128b90 error 4 in libtbb.so.2[7fe2fcda0000+37000]
To Reproduce
- Which version of salmon was used?
salmon 0.9.1
- How was salmon installed (compiled, downloaded executable, through bioconda)?
- Installed through conda
conda create -y --name [email protected] bzip2=1.0.6 salmon=0.9.1 seqtk=1.2
- Which reference (e.g. transcriptome) was used?
ftp://ftp.ensembl.org/pub/release-81/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
- Which read files were used?
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/009/SRR1039509/SRR1039509_1.fastq.gz
- Which program options were used?
{
"salmon_version": "0.9.1",
"index": "./index",
"libType": "U",
"unmatedReads": "./single.fastq",
"output": "./output",
"allowOrphansFMD": [],
"threads": "8",
"incompatPrior": "1e-20",
"biasSpeedSamp": "1",
"fldMax": "1000",
"fldMean": "200",
"fldSD": "80",
"forgettingFactor": "0.65",
"maxOcc": "200",
"maxReadOcc": "100",
"numBiasSamples": "2000000",
"numAuxModelSamples": "5000000",
"numPreAuxModelSamples": "1000000",
"numGibbsSamples": "0",
"numBootstraps": "0",
"vbPrior": "0.001",
"auxDir": "aux_info"
}
Expected behavior
For salmon quant to run to completion
Desktop (please complete the following information):
Ubuntu Linux
Linux ip-172-31-24-127.ec2.internal 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
Additional context
-
This SLURM cluster is managed by Galaxy Cloudman, and my installation of Salmon is currently constrained to a conda install.
-
I've fiddled with many different CPU/Memory requirements for each of these
salmonjobs I've run and have only had successful runs while using a single thread on a single node (--ntasks=1 --nodes=1), but even then there were segfaults observed intermittently. -
The current Galaxy Tool wrapper for Salmon runs
salmon index ... && salmon quant ...for every input fastq by default, but I've also generated and pointedsalmon quantto a common index and have observed the same segfault behavior. I've also tried out the--perfectHashflag in both of these scenarios to no avail. -
I have the ability to specify/wrap another version of Salmon to be compatible with Galaxy if the thought is that a more recent release could help.
-
I'm happy to provide any context past this that could help solve the issue!
-
Also, I lack any biological insight so I'll ping my colleague @gmnelson for backup in that space.
Terminal Output
Example Output
Fatal error: Exit code 139 ()
Version Info: ### A newer version of Salmon is available. ####
###
The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains new features, improvements, and bug fixes; please upgrade at your
earliest convenience.
###
[2018-08-16 19:42:27.806] [jLog] [info] building index
RapMap Indexer
[Step 1 of 4] : counting k-mers
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000434970.2], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000448914.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000415118.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632684.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000631435.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000430425.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000450276.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000431870.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390567.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390580.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000437320.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000431440.2], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390574.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390572.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390569.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390588.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000454691.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000452198.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000414852.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390575.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000439842.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390581.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000454908.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000451044.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632542.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632524.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633009.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632968.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633968.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000631871.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000634154.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000631895.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632859.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633159.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632963.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000604838.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000604446.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000605284.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633210.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633010.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000631884.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632619.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000634070.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632304.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633030.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000603693.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000634085.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000604642.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000603326.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:28.327] [jointLog] [warning] Entry with header [ENST00000437226.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:29.161] [jointLog] [warning] Entry with header [ENST00000632054.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.055] [jointLog] [warning] Entry with header [ENST00000518246.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.060] [jointLog] [warning] Entry with header [ENST00000632342.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.437] [jointLog] [warning] Entry with header [ENST00000603775.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.438] [jointLog] [warning] Entry with header [ENST00000473810.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.543] [jointLog] [warning] Entry with header [ENST00000543745.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.703] [jointLog] [warning] Entry with header [ENST00000579054.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.959] [jointLog] [warning] Entry with header [ENST00000634174.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.973] [jointLog] [warning] Entry with header [ENST00000573437.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
Elapsed time: 17.1995s
[2018-08-16 19:42:45.008] [jointLog] [warning] Removed 11768 transcripts that were sequence duplicates of indexed transcripts.
[2018-08-16 19:42:45.008] [jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag
Replaced 5 non-ATCG nucleotides
Clipped poly-A tails from 1453 transcripts
Building rank-select dictionary and saving to disk done
Elapsed time: 0.0193769s
Writing sequence data to file . . . done
Elapsed time: 0.138102s
[info] Building 32-bit suffix array (length of generalized text is 289267207)
Building suffix array . . . success
saving to disk . . . done
Elapsed time: 0.595015s
done
Elapsed time: 34.8393s
processed 0 positions
processed 1000000 positions
processed 2000000 positions
processed 3000000 positions
processed 4000000 positions
processed 5000000 positions
processed 6000000 positions
processed 7000000 positions
processed 8000000 positions
processed 9000000 positions
processed 10000000 positions
processed 11000000 positions
processed 12000000 positions
processed 13000000 positions
processed 14000000 positions
processed 15000000 positions
processed 16000000 positions
processed 17000000 positions
processed 18000000 positions
processed 19000000 positions
processed 20000000 positions
processed 21000000 positions
processed 22000000 positions
processed 23000000 positions
processed 24000000 positions
processed 25000000 positions
processed 26000000 positions
processed 27000000 positions
processed 28000000 positions
processed 29000000 positions
processed 30000000 positions
processed 31000000 positions
processed 32000000 positions
processed 33000000 positions
processed 34000000 positions
processed 35000000 positions
processed 36000000 positions
processed 37000000 positions
processed 38000000 positions
processed 39000000 positions
processed 40000000 positions
processed 41000000 positions
processed 42000000 positions
processed 43000000 positions
processed 44000000 positions
processed 45000000 positions
processed 46000000 positions
processed 47000000 positions
processed 48000000 positions
processed 49000000 positions
processed 50000000 positions
processed 51000000 positions
processed 52000000 positions
processed 53000000 positions
processed 54000000 positions
processed 55000000 positions
processed 56000000 positions
processed 57000000 positions
processed 58000000 positions
processed 59000000 positions
processed 60000000 positions
processed 61000000 positions
processed 62000000 positions
processed 63000000 positions
processed 64000000 positions
processed 65000000 positions
processed 66000000 positions
processed 67000000 positions
processed 68000000 positions
processed 69000000 positions
processed 70000000 positions
processed 71000000 positions
processed 72000000 positions
processed 73000000 positions
processed 74000000 positions
processed 75000000 positions
processed 76000000 positions
processed 77000000 positions
processed 78000000 positions
processed 79000000 positions
processed 80000000 positions
processed 81000000 positions
processed 82000000 positions
processed 83000000 positions
processed 84000000 positions
processed 85000000 positions
processed 86000000 positions
processed 87000000 positions
processed 88000000 positions
processed 89000000 positions
processed 90000000 positions
processed 91000000 positions
processed 92000000 positions
processed 93000000 positions
processed 94000000 positions
processed 95000000 positions
processed 96000000 positions
processed 97000000 positions
processed 98000000 positions
processed 99000000 positions
processed 100000000 positions
processed 101000000 positions
processed 102000000 positions
processed 103000000 positions
processed 104000000 positions
processed 105000000 positions
processed 106000000 positions
processed 107000000 positions
processed 108000000 positions
processed 109000000 positions
processed 110000000 positions
processed 111000000 positions
processed 112000000 positions
processed 113000000 positions
processed 114000000 positions
processed 115000000 positions
processed 116000000 positions
processed 117000000 positions
processed 118000000 positions
processed 119000000 positions
processed 120000000 positions
processed 121000000 positions
processed 122000000 positions
processed 123000000 positions
processed 124000000 positions
processed 125000000 positions
processed 126000000 positions
processed 127000000 positions
processed 128000000 positions
processed 129000000 positions
processed 130000000 positions
processed 131000000 positions
processed 132000000 positions
processed 133000000 positions
processed 134000000 positions
processed 135000000 positions
processed 136000000 positions
processed 137000000 positions
processed 138000000 positions
processed 139000000 positions
processed 140000000 positions
processed 141000000 positions
processed 142000000 positions
processed 143000000 positions
processed 144000000 positions
processed 145000000 positions
processed 146000000 positions
processed 147000000 positions
processed 148000000 positions
processed 149000000 positions
processed 150000000 positions
processed 151000000 positions
processed 152000000 positions
processed 153000000 positions
processed 154000000 positions
processed 155000000 positions
processed 156000000 positions
processed 157000000 positions
processed 158000000 positions
processed 159000000 positions
processed 160000000 positions
processed 161000000 positions
processed 162000000 positions
processed 163000000 positions
processed 164000000 positions
processed 165000000 positions
processed 166000000 positions
processed 167000000 positions
processed 168000000 positions
processed 169000000 positions
processed 170000000 positions
processed 171000000 positions
processed 172000000 positions
processed 173000000 positions
processed 174000000 positions
processed 175000000 positions
processed 176000000 positions
processed 177000000 positions
processed 178000000 positions
processed 179000000 positions
processed 180000000 positions
processed 181000000 positions
processed 182000000 positions
processed 183000000 positions
processed 184000000 positions
processed 185000000 positions
processed 186000000 positions
processed 187000000 positions
processed 188000000 positions
processed 189000000 positions
processed 190000000 positions
processed 191000000 positions
processed 192000000 positions
processed 193000000 positions
processed 194000000 positions
processed 195000000 positions
processed 196000000 positions
processed 197000000 positions
processed 198000000 positions
processed 199000000 positions
processed 200000000 positions
processed 201000000 positions
processed 202000000 positions
processed 203000000 positions
processed 204000000 positions
processed 205000000 positions
processed 206000000 positions
processed 207000000 positions
processed 208000000 positions
processed 209000000 positions
processed 210000000 positions
processed 211000000 positions
processed 212000000 positions
processed 213000000 positions
processed 214000000 positions
processed 215000000 positions
processed 216000000 positions
processed 217000000 positions
processed 218000000 positions
processed 219000000 positions
processed 220000000 positions
processed 221000000 positions
processed 222000000 positions
processed 223000000 positions
processed 224000000 positions
processed 225000000 positions
processed 226000000 positions
processed 227000000 positions
processed 228000000 positions
processed 229000000 positions
processed 230000000 positions
processed 231000000 positions
processed 232000000 positions
processed 233000000 positions
processed 234000000 positions
processed 235000000 positions
processed 236000000 positions
processed 237000000 positions
processed 238000000 positions
processed 239000000 positions
processed 240000000 positions
processed 241000000 positions
processed 242000000 positions
processed 243000000 positions
processed 244000000 positions
processed 245000000 positions
processed 246000000 positions
processed 247000000 positions
processed 248000000 positions
processed 249000000 positions
processed 250000000 positions
processed 251000000 positions
processed 252000000 positions
processed 253000000 positions
processed 254000000 positions
processed 255000000 positions
processed 256000000 positions
processed 257000000 positions
processed 258000000 positions
processed 259000000 positions
processed 260000000 positions
processed 261000000 positions
processed 262000000 positions
processed 263000000 positions
processed 264000000 positions
processed 265000000 positions
processed 266000000 positions
processed 267000000 positions
processed 268000000 positions
processed 269000000 positions
processed 270000000 positions
processed 271000000 positions
processed 272000000 positions
processed 273000000 positions
processed 274000000 positions
processed 275000000 positions
processed 276000000 positions
processed 277000000 positions
processed 278000000 positions
processed 279000000 positions
processed 280000000 positions
processed 281000000 positions
processed 282000000 positions
processed 283000000 positions
processed 284000000 positions
processed 285000000 positions
processed 286000000 positions
processed 287000000 positions
processed 288000000 positions
processed 289000000 positions
khash had 109134690 keys
saving hash to disk . . . done
Elapsed time: 7.61947s
[2018-08-16 19:47:14.359] [jLog] [info] done building index
Version Info: ### A newer version of Salmon is available. ####
###
The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains new features, improvements, and bug fixes; please upgrade at your
earliest convenience.
###
### salmon (mapping-based) v0.9.1
### [ program ] => salmon
### [ command ] => quant
### [ index ] => { ./index }
### [ libType ] => { U }
### [ unmatedReads ] => { ./single.fastq }
### [ output ] => { ./output }
### [ allowOrphansFMD ] => { }
### [ threads ] => { 16 }
### [ incompatPrior ] => { 1e-20 }
### [ biasSpeedSamp ] => { 1 }
### [ fldMax ] => { 1000 }
### [ fldMean ] => { 200 }
### [ fldSD ] => { 80 }
### [ forgettingFactor ] => { 0.65 }
### [ maxOcc ] => { 200 }
### [ maxReadOcc ] => { 100 }
### [ numBiasSamples ] => { 2000000 }
### [ numAuxModelSamples ] => { 5000000 }
### [ numPreAuxModelSamples ] => { 1000000 }
### [ numGibbsSamples ] => { 0 }
### [ numBootstraps ] => { 0 }
### [ vbPrior ] => { 0.001 }
Logs will be written to ./output/logs
[2018-08-16 19:47:14.418] [jointLog] [info] parsing read library format
[2018-08-16 19:47:14.418] [jointLog] [info] There is 1 library.
[2018-08-16 19:47:14.460] [stderrLog] [info] Loading Suffix Array
[2018-08-16 19:47:14.459] [jointLog] [info] Loading Quasi index
[2018-08-16 19:47:14.459] [jointLog] [info] Loading 32-bit quasi index
[2018-08-16 19:47:15.044] [stderrLog] [info] Loading Transcript Info
[2018-08-16 19:47:15.207] [stderrLog] [info] Loading Rank-Select Bit Array
[2018-08-16 19:47:15.263] [stderrLog] [info] There were 173531 set bits in the bit array
[2018-08-16 19:47:15.285] [stderrLog] [info] Computing transcript lengths
[2018-08-16 19:47:15.285] [stderrLog] [info] Waiting to finish loading hash
[2018-08-16 19:47:20.808] [jointLog] [info] done
[2018-08-16 19:47:20.808] [jointLog] [info] Index contained 173531 targets
[2018-08-16 19:47:20.808] [stderrLog] [info] Done loading index
[A
[32mprocessed[31m 500002 [32mfragments[0m
hits: 2213374; hits per frag: 5.08859[A
[32mprocessed[31m 1000002 [32mfragments[0m
hits: 4422312; hits per frag: 4.78092[A
[32mprocessed[31m 1500006 [32mfragments[0m
hits: 6635818; hits per frag: 4.69843[A
[32mprocessed[31m 2000001 [32mfragments[0m
hits: 8846970; hits per frag: 4.55737[A
[32mprocessed[31m 2500021 [32mfragments[0m
hits: 11062734; hits per frag: 4.49592[A
[32mprocessed[31m 3000000 [32mfragments[0m
hits: 13274990; hits per frag: 4.48667[A
[32mprocessed[31m 3500002 [32mfragments[0m
hits: 15430043; hits per frag: 4.48414[A
[32mprocessed[31m 4000004 [32mfragments[0m
hits: 17638270; hits per frag: 4.48376[A
[32mprocessed[31m 4500000 [32mfragments[0m
hits: 19856371; hits per frag: 4.45983[A
[32mprocessed[31m 5000000 [32mfragments[0m
hits: 22066072; hits per frag: 4.44139[A
[32mprocessed[31m 5500001 [32mfragments[0m
hits: 24279605; hits per frag: 4.45227[A
[32mprocessed[31m 6000001 [32mfragments[0m
hits: 26487237; hits per frag: 4.44903[A
[32mprocessed[31m 6500001 [32mfragments[0m
hits: 28700681; hits per frag: 4.47247[A
[32mprocessed[31m 7000002 [32mfragments[0m
hits: 30906396; hits per frag: 4.43833[A
[32mprocessed[31m 7500000 [32mfragments[0m
hits: 33126825; hits per frag: 4.4543[A
[32mprocessed[31m 8000002 [32mfragments[0m
hits: 35330889; hits per frag: 4.45163[A
[32mprocessed[31m 8500003 [32mfragments[0m
hits: 37539646; hits per frag: 4.44689[A
[32mprocessed[31m 9000000 [32mfragments[0m
hits: 39750282; hits per frag: 4.44409[A
[32mprocessed[31m 9500002 [32mfragments[0m
hits: 41961815; hits per frag: 4.44042[A
[32mprocessed[31m 10000000 [32mfragments[0m
hits: 44170718; hits per frag: 4.45236[A
[32mprocessed[31m 10500000 [32mfragments[0m
hits: 46371950; hits per frag: 4.43688[A
[32mprocessed[31m 11000000 [32mfragments[0m
hits: 48583257; hits per frag: 4.43742[A
[32mprocessed[31m 11500000 [32mfragments[0m
hits: 50786734; hits per frag: 4.42839[A
[32mprocessed[31m 12000000 [32mfragments[0m
hits: 52997209; hits per frag: 4.44023[A
[32mprocessed[31m 12500001 [32mfragments[0m
hits: 55208614; hits per frag: 4.44032[A
[32mprocessed[31m 13000001 [32mfragments[0m
hits: 57414177; hits per frag: 4.42985[A
[32mprocessed[31m 13500000 [32mfragments[0m
hits: 59628726; hits per frag: 4.43762[A
[32mprocessed[31m 14000000 [32mfragments[0m
hits: 61807863; hits per frag: 4.42765[A
[32mprocessed[31m 14500001 [32mfragments[0m
hits: 64017382; hits per frag: 4.4294[A
[32mprocessed[31m 15000000 [32mfragments[0m
hits: 66225532; hits per frag: 4.42625[A
[32mprocessed[31m 15500006 [32mfragments[0m
hits: 68431333; hits per frag: 4.42688[A
[32mprocessed[31m 16000002 [32mfragments[0m
hits: 70643320; hits per frag: 4.44249[A
[32mprocessed[31m 16500002 [32mfragments[0m
hits: 72850859; hits per frag: 4.42435[A
[32mprocessed[31m 17500001 [32mfragments[0m
hits: 77275281; hits per frag: 4.43251[A
[32mprocessed[31m 18000005 [32mfragments[0m
hits: 79494713; hits per frag: 4.433[A
[32mprocessed[31m 18500000 [32mfragments[0m
hits: 81710387; hits per frag: 4.43219[A
[32mprocessed[31m 19000001 [32mfragments[0m
hits: 83924289; hits per frag: 4.42804[A
[32mprocessed[31m 19500000 [32mfragments[0m
hits: 86134985; hits per frag: 4.43392[A
[32mprocessed[31m 20000000 [32mfragments[0m
hits: 88347210; hits per frag: 4.42895[A
[32mprocessed[31m 20500003 [32mfragments[0m
hits: 90559781; hits per frag: 4.43081[A
[32mprocessed[31m 21000000 [32mfragments[0m
hits: 92771131; hits per frag: 4.42429
[2018-08-16 19:47:49.632] [jointLog] [info] Computed 260771 rich equivalence classes for further processing
[2018-08-16 19:47:49.632] [jointLog] [info] Counted 19352476 total reads in the equivalence classes
[2018-08-16 19:47:49.646] [jointLog] [info] Mapping rate = 91.4764%
[2018-08-16 19:47:49.646] [jointLog] [info] finished quantifyLibrary()
[2018-08-16 19:47:49.649] [jointLog] [info] Starting optimizer
/mnt/galaxy/tmp/job_working_directory/000/900/tool_script.sh: line 50: 5733 Segmentation fault (core dumped) salmon quant --index ./index --libType U --unmatedReads ./single.fastq --output ./output --allowOrphans --threads "${GALAXY_SLOTS:-4}" --incompatPrior 1e-20 --biasSpeedSamp 1 --fldMax 1000 --fldMean 200 --fldSD 80 --forgettingFactor 0.65 --maxOcc 200 --maxReadOcc 100 --numBiasSamples 2000000 --numAuxModelSamples 5000000 --numPreAuxModelSamples 1000000 --numGibbsSamples 0 --numBootstraps 0 --vbPrior 0.001
```<details>
EDIT: 8-24-18 I haven't been able to reproduce the segfault outside of SLURM
Hi scott,
Thank you for the detailed report. Im trying to reproduce the issue. So far, i have been unable to reproduce the issue on an ubuntu 16.04 or OSX box with either 0.11.1 or 0.9.1. My next test is to try on an ubuntu 14.04 docker container. I'm afraid there may be a system library issue involved. Could you try upgrading via bioconda as well to see if that helps? The latest linux release is available on bioconda.
@rob-p Thanks for your quick reply! I'll try this out with a more recent conda installation of salmon and report back
I've created a new conda environment based off of salmon==0.11.2 and was able to run it successfully outside of Galaxy/SLURM on the same 14.04 instance.
I had to omit the --sasamp and --maxOcc options that had been utilized in 0.9.1since they seem to not exist with the newer version.
@rob-p I've taken the time to update salmon to 0.11.2 in it's respective Galaxy Tool wrapper and am still seeing the salmon quant segfault when running through SLURM.
bioconda installs of salmon 0.9.1 & 0.11.2 run to completion outside of SLURM on the same machine.
I've seen that #268 was opened and closed recently, but I don't have the liberty to resolve the salmon dependency outside of conda (at least very easily/in a timely fashion).
Update: Have since filed https://github.com/bioconda/bioconda-recipes/issues/10662
@scottx611x if you submit the job from your commandline to slurm it crashes, but if you run it locally it succeeds?
@bgruening Almost. The same command copy and pasted from the failed Galaxy job works outside of SLURM on the same worker node. I haven't tried submitting to SLURM from outside of Galaxy, but I could try that as well.
I had been using the following command with salmon being an alias to the salmon from the mulled conda env that galaxy created.
mkdir ./index && mkdir ./output && salmon index --transcripts /mnt/galaxy/files/001/dataset_1239.dat --kmerLen 31 --threads "${GALAXY_SLOTS:-4}" --index './index' --type 'quasi' && ln -s /mnt/galaxy/files/001/dataset_1240.dat ./single.fastq && salmon quant --index ./index --libType U --unmatedReads ./single.fastq --output ./output --allowOrphans --ma 2 --mp 4 --go 5 --ge 3 --minScoreFraction 0.65 --threads "${GALAXY_SLOTS:-4}" --incompatPrior 1e-20 --biasSpeedSamp 1 --fldMax 1000 --fldMean 200 --fldSD 80 --forgettingFactor 0.65 --maxReadOcc 100 --numBiasSamples 2000000 --numAuxModelSamples 5000000 --numPreAuxModelSamples 1000000 --numGibbsSamples 0 --numBootstraps 0 --consensusSlack 0 --vbPrior 0.001 --sigDigits 3
Maybe SLURM is killing your job because of too less memory allocation and the error message is just really wired?
@bgruening So I've tried some runs today with higher memory configurations and can still reproduce the segfault. I'm going to continue on and try to write up a reproducer for @dpryan79 here.
salmon 0.11.2 run with: NativeSpecification --ntasks=1 --nodes=1 --mem=25000
scontrol show job 94
JobId=94 Name=g990_salmon_refinery_stemcellcommons_org
UserId=galaxy(1001) GroupId=users(100)
Priority=4294901667 Account=(null) QOS=(null)
JobState=COMPLETED Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=00:07:32 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2018-08-27T15:36:41 EligibleTime=2018-08-27T15:36:41
StartTime=2018-08-27T15:36:41 EndTime=2018-08-27T15:44:13
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=main AllocNode:Sid=ip-172-31-24-127:21595
ReqNodeList=(null) ExcNodeList=(null)
NodeList=w19
BatchHost=w19
NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryNode=25000M MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/mnt/galaxy/tmp/job_working_directory/000/990
Galaxy stderr
Fatal error: Exit code 139 ()
...
/mnt/galaxy/tmp/job_working_directory/000/990/tool_script.sh: line 50: 5713 Segmentation fault (core dumped) salmon quant --index ./index --libType U --unmatedReads ./single.fastq --output ./output --allowOrphans --ma 2 --mp 4 --go 5 --ge 3 --minScoreFraction 0.65 --threads "${GALAXY_SLOTS:-4}" --incompatPrior 1e-20 --biasSpeedSamp 1 --fldMax 1000 --fldMean 200 --fldSD 80 --forgettingFactor 0.65 --maxReadOcc 100 --numBiasSamples 2000000 --numAuxModelSamples 5000000 --numPreAuxModelSamples 1000000 --numGibbsSamples 0 --numBootstraps 0 --consensusSlack 0 --vbPrior 0.001 --sigDigits 3
syslog
ip-172-31-30-93 kernel: [ 681.083866] salmon[4167]: segfault at 2641a ip 00007fe2fcdc2dca sp 00007fff27128b90 error 4 in libtbb.so.2[7fe2fcda0000+37000]
salmon 0.11.2 run with: NativeSpecification --ntasks=1 --nodes=1 --mem=100000
scontrol show job 98
JobId=98 Name=g994_salmon_refinery_stemcellcommons_org
UserId=galaxy(1001) GroupId=users(100)
Priority=4294901663 Account=(null) QOS=(null)
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=00:08:19 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2018-08-27T20:06:23 EligibleTime=2018-08-27T20:06:23
StartTime=2018-08-27T20:06:23 EndTime=Unknown
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=main AllocNode:Sid=ip-172-31-24-127:2236
ReqNodeList=(null) ExcNodeList=(null)
NodeList=w21
BatchHost=w21
NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryCPU=100000M MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/mnt/galaxy/tmp/job_working_directory/000/994
Galaxy stderr
Fatal error: Exit code 139 ()
...
/mnt/galaxy/tmp/job_working_directory/000/994/tool_script.sh: line 50: 7495 Segmentation fault (core dumped) salmon quant --index ./index --libType U --unmatedReads ./single.fastq --output ./output --allowOrphans --ma 2 --mp 4 --go 5 --ge 3 --minScoreFraction 0.65 --threads "${GALAXY_SLOTS:-4}" --incompatPrior 1e-20 --biasSpeedSamp 1 --fldMax 1000 --fldMean 200 --fldSD 80 --forgettingFactor 0.65 --maxReadOcc 100 --numBiasSamples 2000000 --numAuxModelSamples 5000000 --numPreAuxModelSamples 1000000 --numGibbsSamples 0 --numBootstraps 0 --consensusSlack 0 --vbPrior 0.001 --sigDigits 3
syslog
Aug 27 20:14:23 ip-172-31-16-139 kernel: [ 2134.447133] traps: salmon[7495] general protection ip:7ff9ce320dca sp:7ffd6e497020 error:0 in libtbb.so.2[7ff9ce2fe000+37000]
salmon 0.11.2 run with: NativeSpecification --ntasks=1 --nodes=1 --mem-per-cpu=100000
scontrol show job 99
JobId=99 Name=g995_salmon_refinery_stemcellcommons_org
UserId=galaxy(1001) GroupId=users(100)
Priority=4294901662 Account=(null) QOS=(null)
JobState=COMPLETED Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=00:07:36 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2018-08-27T20:20:26 EligibleTime=2018-08-27T20:20:26
StartTime=2018-08-27T20:20:26 EndTime=2018-08-27T20:28:02
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=main AllocNode:Sid=ip-172-31-24-127:7975
ReqNodeList=(null) ExcNodeList=(null)
NodeList=w21
BatchHost=w21
NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryNode=100000M MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/mnt/galaxy/tmp/job_working_directory/000/995
Galaxy stderr
Fatal error: Exit code 139 ()
...
/mnt/galaxy/tmp/job_working_directory/000/995/tool_script.sh: line 50: 9700 Segmentation fault (core dumped) salmon quant --index ./index --libType U --unmatedReads ./single.fastq --output ./output --allowOrphans --ma 2 --mp 4 --go 5 --ge 3 --minScoreFraction 0.65 --threads "${GALAXY_SLOTS:-4}" --incompatPrior 1e-20 --biasSpeedSamp 1 --fldMax 1000 --fldMean 200 --fldSD 80 --forgettingFactor 0.65 --maxReadOcc 100 --numBiasSamples 2000000 --numAuxModelSamples 5000000 --numPreAuxModelSamples 1000000 --numGibbsSamples 0 --numBootstraps 0 --consensusSlack 0 --vbPrior 0.001 --sigDigits 3
syslog
Aug 27 20:27:57 ip-172-31-16-139 kernel: [ 2949.318784] traps: salmon[9700] general protection ip:7fb66057cdca sp:7ffe1bf3a900 error:0 in libtbb.so.2[7fb66055a000+37000]
I can't reproduce this using 0.11.2 on Galaxy (18.05, not that that should matter) with a slurm (17.02.9) cluster. I've tried using both 20 cores and 1 core (in case something weird is going on with the threading) and both run fine. I used our cluster default of 6GB per core, which is overkill for this job. My guess is that the same tbb version is getting used in each version of salmon you're trying and that it got corrupted at some point. Are you spinning up a new CloudMan instance for these runs or are you restarting a saved instance? If you're not starting a brand new instance then try that, then you can avoid using the same possibly corrupted tbb install.
@dpryan79 Thanks for trying to reproduce, I really appreciate this. We're currently bringing up CloudMan instances derived from shared cluster strings. I'll try to bring up a fresh CloudMan instance and try to see the same behavior that you are.
Did this ever get tracked down? we are having a situation where salmon seems to segfault whenever using slurm (this time it's salmon index that segfaults, though). wondering if you figured out a solution.
@nsheff Sorry I was never able to dig into this further
Also getting segmentation fault. Any progress on this? This is salmon v1.3.0, installed with conda or using the binary, running in slurm. I do not get a segmentation fault if I pass only a single file, but I do if I pass two files.
$ ./src/salmon-latest_linux_x86_64/bin/salmon quant --threads $(nproc) --libType U -t GRCh38_latest_rna.fa -a data/processed/bwa-mem/SRR10571655.sam data/processed/bwa-mem/SRR10571656.sam -o _tmp/
Version Info Exception: server did not respond before timeout
# salmon (alignment-based) v1.3.0
# [ program ] => salmon
# [ command ] => quant
# [ threads ] => { 32 }
# [ libType ] => { U }
# [ targets ] => { GRCh38_latest_rna.fa }
# [ alignments ] => { data/processed/bwa-mem/SRR10571655.sam data/processed/bwa-mem/SRR10571656.sam }
# [ output ] => { _tmp/ }
Logs will be written to _tmp/logs
[2020-10-12 16:13:21.969] [jointLog] [info] setting maxHashResizeThreads to 32
[2020-10-12 16:13:21.969] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored.
Library format { type:single end, relative orientation:none, strandedness:unstranded }
[2020-10-12 16:13:21.969] [jointLog] [info] numQuantThreads = 26
parseThreads = 6
Checking that provided alignment files have consistent headers . . . done
Populating targets from aln = "data/processed/bwa-mem/SRR10571655.sam", fasta = "GRCh38_latest_rna.fa" . . .done
[2020-10-12 16:13:26.979] [jointLog] [info] replaced 5 non-ACGT nucleotides with random nucleotides
processed 103000000 reads in current round[1] 1994 segmentation fault (core dumped) ./src/salmon-latest_linux_x86_64/bin/salmon quant --threads $(nproc) --libTyp
Always at 103000000 reads.
Hi @izaakm,
This segfault is unlikely related to the issue here, since that happened in "mapping mode" (salmon performing mapping itself), and yours is happening in alignment-based mode (you're feeding SAM files to salmon). Does it fail to occur when you provide either of the SAM files to salmon? That is, does it run to completion with both data/processed/bwa-mem/SRR10571655.sam and data/processed/bwa-mem/SRR10571656.sam individually? Also, what if you combine them via a pipe (i.e. something like):
./src/salmon-latest_linux_x86_64/bin/salmon quant --threads $(nproc) --libType U -t GRCh38_latest_rna.fa -a <(cat data/processed/bwa-mem/SRR10571655.sam <(samtools view data/processed/bwa-mem/SRR10571656.sam)) -o _tmp/
the double redirect is just to make sure the header isn't included in the second sam file. Also, is the reference that you are passing to the -t option identical to the one with which bwa-mem was run? If the problem persists, we might need the sam/bam files to track it down further, since I imagine it may be data-dependent.
--Rob
It does run with each of the two files separately, but when I try the command with the double redirect I get a message like the one below for many/all[?] of the sequences in the reference and quant.sf is empty (except the header).
[2020-10-12 17:05:47.406] [jointLog] [warning] Transcript XM_024446103.1 appears in the reference but did not appear in the BAM
That is interesting. The attempt in the double redirect was to include all alignment records from the second sam file simply concatenated to the first. Assuming the SAM files contain the same header, this should be OK (simply another way to treat them as a single input). However this warning suggests that there were references in the file passed to -t that did not have a corresponding entry in the SAM file. Yet, with the redirect, the first sam file should contain the full header. I don't have a clear understanding of why this would happen yet.