salmon icon indicating copy to clipboard operation
salmon copied to clipboard

Consistent salmon quant segfault

Open scottx611x opened this issue 7 years ago • 17 comments
trafficstars

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)? salmon

Describe the bug Running salmon quant through my SLURM cluster consistently segfaults. I've attempted runs on m4.2xlarge & m4.8xlarge worker nodes.

Aug 16 19:38:23 ip-172-31-30-93 kernel: [ 681.083866] salmon[4167]: segfault at 2641a ip 00007fe2fcdc2dca sp 00007fff27128b90 error 4 in libtbb.so.2[7fe2fcda0000+37000]

To Reproduce

  • Which version of salmon was used?
    • salmon 0.9.1
  • How was salmon installed (compiled, downloaded executable, through bioconda)?
    • Installed through conda
    • conda create -y --name [email protected] bzip2=1.0.6 salmon=0.9.1 seqtk=1.2
  • Which reference (e.g. transcriptome) was used?
    • ftp://ftp.ensembl.org/pub/release-81/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
  • Which read files were used?
    • ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR103/009/SRR1039509/SRR1039509_1.fastq.gz
  • Which program options were used?
{
    "salmon_version": "0.9.1",
    "index": "./index",
    "libType": "U",
    "unmatedReads": "./single.fastq",
    "output": "./output",
    "allowOrphansFMD": [],
    "threads": "8",
    "incompatPrior": "1e-20",
    "biasSpeedSamp": "1",
    "fldMax": "1000",
    "fldMean": "200",
    "fldSD": "80",
    "forgettingFactor": "0.65",
    "maxOcc": "200",
    "maxReadOcc": "100",
    "numBiasSamples": "2000000",
    "numAuxModelSamples": "5000000",
    "numPreAuxModelSamples": "1000000",
    "numGibbsSamples": "0",
    "numBootstraps": "0",
    "vbPrior": "0.001",
    "auxDir": "aux_info"
}

Expected behavior For salmon quant to run to completion

Desktop (please complete the following information):

Ubuntu Linux
Linux ip-172-31-24-127.ec2.internal 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.5 LTS
Release:	14.04
Codename:	trusty

Additional context

  • This SLURM cluster is managed by Galaxy Cloudman, and my installation of Salmon is currently constrained to a conda install.

  • I've fiddled with many different CPU/Memory requirements for each of these salmon jobs I've run and have only had successful runs while using a single thread on a single node (--ntasks=1 --nodes=1), but even then there were segfaults observed intermittently.

  • The current Galaxy Tool wrapper for Salmon runs salmon index ... && salmon quant ... for every input fastq by default, but I've also generated and pointed salmon quant to a common index and have observed the same segfault behavior. I've also tried out the --perfectHash flag in both of these scenarios to no avail.

  • I have the ability to specify/wrap another version of Salmon to be compatible with Galaxy if the thought is that a more recent release could help.

  • I'm happy to provide any context past this that could help solve the issue!

  • Also, I lack any biological insight so I'll ping my colleague @gmnelson for backup in that space.

Terminal Output

Example Output
Fatal error: Exit code 139 ()
Version Info: ### A newer version of Salmon is available. ####
###
The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains new features, improvements, and bug fixes; please upgrade at your
earliest convenience.
###
[2018-08-16 19:42:27.806] [jLog] [info] building index
RapMap Indexer

[Step 1 of 4] : counting k-mers
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000434970.2], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000448914.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000415118.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632684.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000631435.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000430425.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000450276.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000431870.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390567.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390580.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000437320.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000431440.2], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390574.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390572.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390569.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390588.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000454691.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000452198.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000414852.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390575.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000439842.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000390581.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000454908.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000451044.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632542.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632524.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633009.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632968.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633968.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000631871.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000634154.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000631895.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632859.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633159.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632963.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000604838.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000604446.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000605284.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633210.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633010.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000631884.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632619.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000634070.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000632304.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000633030.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000603693.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000634085.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000604642.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:27.811] [jointLog] [warning] Entry with header [ENST00000603326.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:28.327] [jointLog] [warning] Entry with header [ENST00000437226.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:29.161] [jointLog] [warning] Entry with header [ENST00000632054.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.055] [jointLog] [warning] Entry with header [ENST00000518246.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.060] [jointLog] [warning] Entry with header [ENST00000632342.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.437] [jointLog] [warning] Entry with header [ENST00000603775.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.438] [jointLog] [warning] Entry with header [ENST00000473810.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.543] [jointLog] [warning] Entry with header [ENST00000543745.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.703] [jointLog] [warning] Entry with header [ENST00000579054.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.959] [jointLog] [warning] Entry with header [ENST00000634174.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
[2018-08-16 19:42:44.973] [jointLog] [warning] Entry with header [ENST00000573437.1], had length less than the k-mer length of 31 (perhaps after poly-A clipping)
Elapsed time: 17.1995s

[2018-08-16 19:42:45.008] [jointLog] [warning] Removed 11768 transcripts that were sequence duplicates of indexed transcripts.
[2018-08-16 19:42:45.008] [jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag
Replaced 5 non-ATCG nucleotides
Clipped poly-A tails from 1453 transcripts
Building rank-select dictionary and saving to disk done
Elapsed time: 0.0193769s
Writing sequence data to file . . . done
Elapsed time: 0.138102s
[info] Building 32-bit suffix array (length of generalized text is 289267207)
Building suffix array . . . success
saving to disk . . . done
Elapsed time: 0.595015s
done
Elapsed time: 34.8393s


processed 0 positions

processed 1000000 positions

processed 2000000 positions

processed 3000000 positions

processed 4000000 positions

processed 5000000 positions

processed 6000000 positions

processed 7000000 positions

processed 8000000 positions

processed 9000000 positions

processed 10000000 positions

processed 11000000 positions

processed 12000000 positions

processed 13000000 positions

processed 14000000 positions

processed 15000000 positions

processed 16000000 positions

processed 17000000 positions

processed 18000000 positions

processed 19000000 positions

processed 20000000 positions

processed 21000000 positions

processed 22000000 positions

processed 23000000 positions

processed 24000000 positions

processed 25000000 positions

processed 26000000 positions

processed 27000000 positions

processed 28000000 positions

processed 29000000 positions

processed 30000000 positions

processed 31000000 positions

processed 32000000 positions

processed 33000000 positions

processed 34000000 positions

processed 35000000 positions

processed 36000000 positions

processed 37000000 positions

processed 38000000 positions

processed 39000000 positions

processed 40000000 positions

processed 41000000 positions

processed 42000000 positions

processed 43000000 positions

processed 44000000 positions

processed 45000000 positions

processed 46000000 positions

processed 47000000 positions

processed 48000000 positions

processed 49000000 positions

processed 50000000 positions

processed 51000000 positions

processed 52000000 positions

processed 53000000 positions

processed 54000000 positions

processed 55000000 positions

processed 56000000 positions

processed 57000000 positions

processed 58000000 positions

processed 59000000 positions

processed 60000000 positions

processed 61000000 positions

processed 62000000 positions

processed 63000000 positions

processed 64000000 positions

processed 65000000 positions

processed 66000000 positions

processed 67000000 positions

processed 68000000 positions

processed 69000000 positions

processed 70000000 positions

processed 71000000 positions

processed 72000000 positions

processed 73000000 positions

processed 74000000 positions

processed 75000000 positions

processed 76000000 positions

processed 77000000 positions

processed 78000000 positions

processed 79000000 positions

processed 80000000 positions

processed 81000000 positions

processed 82000000 positions

processed 83000000 positions

processed 84000000 positions

processed 85000000 positions

processed 86000000 positions

processed 87000000 positions

processed 88000000 positions

processed 89000000 positions

processed 90000000 positions

processed 91000000 positions

processed 92000000 positions

processed 93000000 positions

processed 94000000 positions

processed 95000000 positions

processed 96000000 positions

processed 97000000 positions

processed 98000000 positions

processed 99000000 positions

processed 100000000 positions

processed 101000000 positions

processed 102000000 positions

processed 103000000 positions

processed 104000000 positions

processed 105000000 positions

processed 106000000 positions

processed 107000000 positions

processed 108000000 positions

processed 109000000 positions

processed 110000000 positions

processed 111000000 positions

processed 112000000 positions

processed 113000000 positions

processed 114000000 positions

processed 115000000 positions

processed 116000000 positions

processed 117000000 positions

processed 118000000 positions

processed 119000000 positions

processed 120000000 positions

processed 121000000 positions

processed 122000000 positions

processed 123000000 positions

processed 124000000 positions

processed 125000000 positions

processed 126000000 positions

processed 127000000 positions

processed 128000000 positions

processed 129000000 positions

processed 130000000 positions

processed 131000000 positions

processed 132000000 positions

processed 133000000 positions

processed 134000000 positions

processed 135000000 positions

processed 136000000 positions

processed 137000000 positions

processed 138000000 positions

processed 139000000 positions

processed 140000000 positions

processed 141000000 positions

processed 142000000 positions

processed 143000000 positions

processed 144000000 positions

processed 145000000 positions

processed 146000000 positions

processed 147000000 positions

processed 148000000 positions

processed 149000000 positions

processed 150000000 positions

processed 151000000 positions

processed 152000000 positions

processed 153000000 positions

processed 154000000 positions

processed 155000000 positions

processed 156000000 positions

processed 157000000 positions

processed 158000000 positions

processed 159000000 positions

processed 160000000 positions

processed 161000000 positions

processed 162000000 positions

processed 163000000 positions

processed 164000000 positions

processed 165000000 positions

processed 166000000 positions

processed 167000000 positions

processed 168000000 positions

processed 169000000 positions

processed 170000000 positions

processed 171000000 positions

processed 172000000 positions

processed 173000000 positions

processed 174000000 positions

processed 175000000 positions

processed 176000000 positions

processed 177000000 positions

processed 178000000 positions

processed 179000000 positions

processed 180000000 positions

processed 181000000 positions

processed 182000000 positions

processed 183000000 positions

processed 184000000 positions

processed 185000000 positions

processed 186000000 positions

processed 187000000 positions

processed 188000000 positions

processed 189000000 positions

processed 190000000 positions

processed 191000000 positions

processed 192000000 positions

processed 193000000 positions

processed 194000000 positions

processed 195000000 positions

processed 196000000 positions

processed 197000000 positions

processed 198000000 positions

processed 199000000 positions

processed 200000000 positions

processed 201000000 positions

processed 202000000 positions

processed 203000000 positions

processed 204000000 positions

processed 205000000 positions

processed 206000000 positions

processed 207000000 positions

processed 208000000 positions

processed 209000000 positions

processed 210000000 positions

processed 211000000 positions

processed 212000000 positions

processed 213000000 positions

processed 214000000 positions

processed 215000000 positions

processed 216000000 positions

processed 217000000 positions

processed 218000000 positions

processed 219000000 positions

processed 220000000 positions

processed 221000000 positions

processed 222000000 positions

processed 223000000 positions

processed 224000000 positions

processed 225000000 positions

processed 226000000 positions

processed 227000000 positions

processed 228000000 positions

processed 229000000 positions

processed 230000000 positions

processed 231000000 positions

processed 232000000 positions

processed 233000000 positions

processed 234000000 positions

processed 235000000 positions

processed 236000000 positions

processed 237000000 positions

processed 238000000 positions

processed 239000000 positions

processed 240000000 positions

processed 241000000 positions

processed 242000000 positions

processed 243000000 positions

processed 244000000 positions

processed 245000000 positions

processed 246000000 positions

processed 247000000 positions

processed 248000000 positions

processed 249000000 positions

processed 250000000 positions

processed 251000000 positions

processed 252000000 positions

processed 253000000 positions

processed 254000000 positions

processed 255000000 positions

processed 256000000 positions

processed 257000000 positions

processed 258000000 positions

processed 259000000 positions

processed 260000000 positions

processed 261000000 positions

processed 262000000 positions

processed 263000000 positions

processed 264000000 positions

processed 265000000 positions

processed 266000000 positions

processed 267000000 positions

processed 268000000 positions

processed 269000000 positions

processed 270000000 positions

processed 271000000 positions

processed 272000000 positions

processed 273000000 positions

processed 274000000 positions

processed 275000000 positions

processed 276000000 positions

processed 277000000 positions

processed 278000000 positions

processed 279000000 positions

processed 280000000 positions

processed 281000000 positions

processed 282000000 positions

processed 283000000 positions

processed 284000000 positions

processed 285000000 positions

processed 286000000 positions

processed 287000000 positions

processed 288000000 positions

processed 289000000 positions
khash had 109134690 keys
saving hash to disk . . . done
Elapsed time: 7.61947s
[2018-08-16 19:47:14.359] [jLog] [info] done building index
Version Info: ### A newer version of Salmon is available. ####
###
The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains new features, improvements, and bug fixes; please upgrade at your
earliest convenience.
###
### salmon (mapping-based) v0.9.1
### [ program ] => salmon 
### [ command ] => quant 
### [ index ] => { ./index }
### [ libType ] => { U }
### [ unmatedReads ] => { ./single.fastq }
### [ output ] => { ./output }
### [ allowOrphansFMD ] => { }
### [ threads ] => { 16 }
### [ incompatPrior ] => { 1e-20 }
### [ biasSpeedSamp ] => { 1 }
### [ fldMax ] => { 1000 }
### [ fldMean ] => { 200 }
### [ fldSD ] => { 80 }
### [ forgettingFactor ] => { 0.65 }
### [ maxOcc ] => { 200 }
### [ maxReadOcc ] => { 100 }
### [ numBiasSamples ] => { 2000000 }
### [ numAuxModelSamples ] => { 5000000 }
### [ numPreAuxModelSamples ] => { 1000000 }
### [ numGibbsSamples ] => { 0 }
### [ numBootstraps ] => { 0 }
### [ vbPrior ] => { 0.001 }
Logs will be written to ./output/logs
[2018-08-16 19:47:14.418] [jointLog] [info] parsing read library format
[2018-08-16 19:47:14.418] [jointLog] [info] There is 1 library.
[2018-08-16 19:47:14.460] [stderrLog] [info] Loading Suffix Array 
[2018-08-16 19:47:14.459] [jointLog] [info] Loading Quasi index
[2018-08-16 19:47:14.459] [jointLog] [info] Loading 32-bit quasi index
[2018-08-16 19:47:15.044] [stderrLog] [info] Loading Transcript Info 
[2018-08-16 19:47:15.207] [stderrLog] [info] Loading Rank-Select Bit Array
[2018-08-16 19:47:15.263] [stderrLog] [info] There were 173531 set bits in the bit array
[2018-08-16 19:47:15.285] [stderrLog] [info] Computing transcript lengths
[2018-08-16 19:47:15.285] [stderrLog] [info] Waiting to finish loading hash
[2018-08-16 19:47:20.808] [jointLog] [info] done
[2018-08-16 19:47:20.808] [jointLog] [info] Index contained 173531 targets
[2018-08-16 19:47:20.808] [stderrLog] [info] Done loading index




[A

[32mprocessed[31m 500002 [32mfragments[0m
hits: 2213374; hits per frag:  5.08859[A

[32mprocessed[31m 1000002 [32mfragments[0m
hits: 4422312; hits per frag:  4.78092[A

[32mprocessed[31m 1500006 [32mfragments[0m
hits: 6635818; hits per frag:  4.69843[A

[32mprocessed[31m 2000001 [32mfragments[0m
hits: 8846970; hits per frag:  4.55737[A

[32mprocessed[31m 2500021 [32mfragments[0m
hits: 11062734; hits per frag:  4.49592[A

[32mprocessed[31m 3000000 [32mfragments[0m
hits: 13274990; hits per frag:  4.48667[A

[32mprocessed[31m 3500002 [32mfragments[0m
hits: 15430043; hits per frag:  4.48414[A

[32mprocessed[31m 4000004 [32mfragments[0m
hits: 17638270; hits per frag:  4.48376[A

[32mprocessed[31m 4500000 [32mfragments[0m
hits: 19856371; hits per frag:  4.45983[A

[32mprocessed[31m 5000000 [32mfragments[0m
hits: 22066072; hits per frag:  4.44139[A

[32mprocessed[31m 5500001 [32mfragments[0m
hits: 24279605; hits per frag:  4.45227[A

[32mprocessed[31m 6000001 [32mfragments[0m
hits: 26487237; hits per frag:  4.44903[A

[32mprocessed[31m 6500001 [32mfragments[0m
hits: 28700681; hits per frag:  4.47247[A

[32mprocessed[31m 7000002 [32mfragments[0m
hits: 30906396; hits per frag:  4.43833[A

[32mprocessed[31m 7500000 [32mfragments[0m
hits: 33126825; hits per frag:  4.4543[A

[32mprocessed[31m 8000002 [32mfragments[0m
hits: 35330889; hits per frag:  4.45163[A

[32mprocessed[31m 8500003 [32mfragments[0m
hits: 37539646; hits per frag:  4.44689[A

[32mprocessed[31m 9000000 [32mfragments[0m
hits: 39750282; hits per frag:  4.44409[A

[32mprocessed[31m 9500002 [32mfragments[0m
hits: 41961815; hits per frag:  4.44042[A

[32mprocessed[31m 10000000 [32mfragments[0m
hits: 44170718; hits per frag:  4.45236[A

[32mprocessed[31m 10500000 [32mfragments[0m
hits: 46371950; hits per frag:  4.43688[A

[32mprocessed[31m 11000000 [32mfragments[0m
hits: 48583257; hits per frag:  4.43742[A

[32mprocessed[31m 11500000 [32mfragments[0m
hits: 50786734; hits per frag:  4.42839[A

[32mprocessed[31m 12000000 [32mfragments[0m
hits: 52997209; hits per frag:  4.44023[A

[32mprocessed[31m 12500001 [32mfragments[0m
hits: 55208614; hits per frag:  4.44032[A

[32mprocessed[31m 13000001 [32mfragments[0m
hits: 57414177; hits per frag:  4.42985[A

[32mprocessed[31m 13500000 [32mfragments[0m
hits: 59628726; hits per frag:  4.43762[A

[32mprocessed[31m 14000000 [32mfragments[0m
hits: 61807863; hits per frag:  4.42765[A

[32mprocessed[31m 14500001 [32mfragments[0m
hits: 64017382; hits per frag:  4.4294[A

[32mprocessed[31m 15000000 [32mfragments[0m
hits: 66225532; hits per frag:  4.42625[A

[32mprocessed[31m 15500006 [32mfragments[0m
hits: 68431333; hits per frag:  4.42688[A

[32mprocessed[31m 16000002 [32mfragments[0m
hits: 70643320; hits per frag:  4.44249[A

[32mprocessed[31m 16500002 [32mfragments[0m
hits: 72850859; hits per frag:  4.42435[A

[32mprocessed[31m 17500001 [32mfragments[0m
hits: 77275281; hits per frag:  4.43251[A

[32mprocessed[31m 18000005 [32mfragments[0m
hits: 79494713; hits per frag:  4.433[A

[32mprocessed[31m 18500000 [32mfragments[0m
hits: 81710387; hits per frag:  4.43219[A

[32mprocessed[31m 19000001 [32mfragments[0m
hits: 83924289; hits per frag:  4.42804[A

[32mprocessed[31m 19500000 [32mfragments[0m
hits: 86134985; hits per frag:  4.43392[A

[32mprocessed[31m 20000000 [32mfragments[0m
hits: 88347210; hits per frag:  4.42895[A

[32mprocessed[31m 20500003 [32mfragments[0m
hits: 90559781; hits per frag:  4.43081[A

[32mprocessed[31m 21000000 [32mfragments[0m
hits: 92771131; hits per frag:  4.42429







[2018-08-16 19:47:49.632] [jointLog] [info] Computed 260771 rich equivalence classes for further processing
[2018-08-16 19:47:49.632] [jointLog] [info] Counted 19352476 total reads in the equivalence classes 
[2018-08-16 19:47:49.646] [jointLog] [info] Mapping rate = 91.4764%

[2018-08-16 19:47:49.646] [jointLog] [info] finished quantifyLibrary()
[2018-08-16 19:47:49.649] [jointLog] [info] Starting optimizer
/mnt/galaxy/tmp/job_working_directory/000/900/tool_script.sh: line 50:  5733 Segmentation fault      (core dumped) salmon quant --index ./index --libType U --unmatedReads ./single.fastq --output ./output --allowOrphans --threads "${GALAXY_SLOTS:-4}" --incompatPrior 1e-20 --biasSpeedSamp 1 --fldMax 1000 --fldMean 200 --fldSD 80 --forgettingFactor 0.65 --maxOcc 200 --maxReadOcc 100 --numBiasSamples 2000000 --numAuxModelSamples 5000000 --numPreAuxModelSamples 1000000 --numGibbsSamples 0 --numBootstraps 0 --vbPrior 0.001
```<details>

scottx611x avatar Aug 16 '18 20:08 scottx611x

EDIT: 8-24-18 I haven't been able to reproduce the segfault outside of SLURM

scottx611x avatar Aug 17 '18 12:08 scottx611x

Hi scott,

Thank you for the detailed report. Im trying to reproduce the issue. So far, i have been unable to reproduce the issue on an ubuntu 16.04 or OSX box with either 0.11.1 or 0.9.1. My next test is to try on an ubuntu 14.04 docker container. I'm afraid there may be a system library issue involved. Could you try upgrading via bioconda as well to see if that helps? The latest linux release is available on bioconda.

rob-p avatar Aug 17 '18 12:08 rob-p

@rob-p Thanks for your quick reply! I'll try this out with a more recent conda installation of salmon and report back

scottx611x avatar Aug 17 '18 14:08 scottx611x

I've created a new conda environment based off of salmon==0.11.2 and was able to run it successfully outside of Galaxy/SLURM on the same 14.04 instance.

I had to omit the --sasamp and --maxOcc options that had been utilized in 0.9.1since they seem to not exist with the newer version.

scottx611x avatar Aug 17 '18 17:08 scottx611x

@rob-p I've taken the time to update salmon to 0.11.2 in it's respective Galaxy Tool wrapper and am still seeing the salmon quant segfault when running through SLURM.

bioconda installs of salmon 0.9.1 & 0.11.2 run to completion outside of SLURM on the same machine.

I've seen that #268 was opened and closed recently, but I don't have the liberty to resolve the salmon dependency outside of conda (at least very easily/in a timely fashion).

Update: Have since filed https://github.com/bioconda/bioconda-recipes/issues/10662

scottx611x avatar Aug 24 '18 17:08 scottx611x

@scottx611x if you submit the job from your commandline to slurm it crashes, but if you run it locally it succeeds?

bgruening avatar Aug 24 '18 20:08 bgruening

@bgruening Almost. The same command copy and pasted from the failed Galaxy job works outside of SLURM on the same worker node. I haven't tried submitting to SLURM from outside of Galaxy, but I could try that as well.

I had been using the following command with salmon being an alias to the salmon from the mulled conda env that galaxy created.

mkdir ./index && mkdir ./output && salmon index --transcripts /mnt/galaxy/files/001/dataset_1239.dat --kmerLen 31 --threads "${GALAXY_SLOTS:-4}" --index './index' --type 'quasi'  && ln -s /mnt/galaxy/files/001/dataset_1240.dat ./single.fastq && salmon quant --index ./index --libType U --unmatedReads ./single.fastq --output ./output --allowOrphans  --ma 2 --mp 4 --go 5 --ge 3 --minScoreFraction 0.65    --threads "${GALAXY_SLOTS:-4}" --incompatPrior 1e-20    --biasSpeedSamp 1  --fldMax 1000 --fldMean 200 --fldSD 80 --forgettingFactor 0.65    --maxReadOcc 100   --numBiasSamples 2000000 --numAuxModelSamples 5000000 --numPreAuxModelSamples 1000000 --numGibbsSamples 0 --numBootstraps 0 --consensusSlack 0  --vbPrior 0.001  --sigDigits 3

scottx611x avatar Aug 24 '18 20:08 scottx611x

Maybe SLURM is killing your job because of too less memory allocation and the error message is just really wired?

bgruening avatar Aug 24 '18 21:08 bgruening

@bgruening So I've tried some runs today with higher memory configurations and can still reproduce the segfault. I'm going to continue on and try to write up a reproducer for @dpryan79 here.

salmon 0.11.2 run with: NativeSpecification --ntasks=1 --nodes=1 --mem=25000

  • scontrol show job 94
JobId=94 Name=g990_salmon_refinery_stemcellcommons_org
   UserId=galaxy(1001) GroupId=users(100)
   Priority=4294901667 Account=(null) QOS=(null)
   JobState=COMPLETED Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
   RunTime=00:07:32 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2018-08-27T15:36:41 EligibleTime=2018-08-27T15:36:41
   StartTime=2018-08-27T15:36:41 EndTime=2018-08-27T15:44:13
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=main AllocNode:Sid=ip-172-31-24-127:21595
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=w19
   BatchHost=w19
   NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
   MinCPUsNode=1 MinMemoryNode=25000M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/mnt/galaxy/tmp/job_working_directory/000/990
  • Galaxy stderr
Fatal error: Exit code 139 ()
...
/mnt/galaxy/tmp/job_working_directory/000/990/tool_script.sh: line 50:  5713 Segmentation fault      (core dumped) salmon quant --index ./index --libType U --unmatedReads ./single.fastq --output ./output --allowOrphans --ma 2 --mp 4 --go 5 --ge 3 --minScoreFraction 0.65 --threads "${GALAXY_SLOTS:-4}" --incompatPrior 1e-20 --biasSpeedSamp 1 --fldMax 1000 --fldMean 200 --fldSD 80 --forgettingFactor 0.65 --maxReadOcc 100 --numBiasSamples 2000000 --numAuxModelSamples 5000000 --numPreAuxModelSamples 1000000 --numGibbsSamples 0 --numBootstraps 0 --consensusSlack 0 --vbPrior 0.001 --sigDigits 3
  • syslog
ip-172-31-30-93 kernel: [ 681.083866] salmon[4167]: segfault at 2641a ip 00007fe2fcdc2dca sp 00007fff27128b90 error 4 in libtbb.so.2[7fe2fcda0000+37000]

salmon 0.11.2 run with: NativeSpecification --ntasks=1 --nodes=1 --mem=100000

  • scontrol show job 98
JobId=98 Name=g994_salmon_refinery_stemcellcommons_org
   UserId=galaxy(1001) GroupId=users(100)
   Priority=4294901663 Account=(null) QOS=(null)
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
   RunTime=00:08:19 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2018-08-27T20:06:23 EligibleTime=2018-08-27T20:06:23
   StartTime=2018-08-27T20:06:23 EndTime=Unknown
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=main AllocNode:Sid=ip-172-31-24-127:2236
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=w21
   BatchHost=w21
   NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
   MinCPUsNode=1 MinMemoryCPU=100000M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/mnt/galaxy/tmp/job_working_directory/000/994
  • Galaxy stderr
Fatal error: Exit code 139 ()
...
/mnt/galaxy/tmp/job_working_directory/000/994/tool_script.sh: line 50:  7495 Segmentation fault      (core dumped) salmon quant --index ./index --libType U --unmatedReads ./single.fastq --output ./output --allowOrphans --ma 2 --mp 4 --go 5 --ge 3 --minScoreFraction 0.65 --threads "${GALAXY_SLOTS:-4}" --incompatPrior 1e-20 --biasSpeedSamp 1 --fldMax 1000 --fldMean 200 --fldSD 80 --forgettingFactor 0.65 --maxReadOcc 100 --numBiasSamples 2000000 --numAuxModelSamples 5000000 --numPreAuxModelSamples 1000000 --numGibbsSamples 0 --numBootstraps 0 --consensusSlack 0 --vbPrior 0.001 --sigDigits 3
  • syslog
Aug 27 20:14:23 ip-172-31-16-139 kernel: [ 2134.447133] traps: salmon[7495] general protection ip:7ff9ce320dca sp:7ffd6e497020 error:0 in libtbb.so.2[7ff9ce2fe000+37000]

salmon 0.11.2 run with: NativeSpecification --ntasks=1 --nodes=1 --mem-per-cpu=100000

  • scontrol show job 99
JobId=99 Name=g995_salmon_refinery_stemcellcommons_org
   UserId=galaxy(1001) GroupId=users(100)
   Priority=4294901662 Account=(null) QOS=(null)
   JobState=COMPLETED Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
   RunTime=00:07:36 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2018-08-27T20:20:26 EligibleTime=2018-08-27T20:20:26
   StartTime=2018-08-27T20:20:26 EndTime=2018-08-27T20:28:02
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=main AllocNode:Sid=ip-172-31-24-127:7975
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=w21
   BatchHost=w21
   NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
   MinCPUsNode=1 MinMemoryNode=100000M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/mnt/galaxy/tmp/job_working_directory/000/995
  • Galaxy stderr
Fatal error: Exit code 139 ()
...
/mnt/galaxy/tmp/job_working_directory/000/995/tool_script.sh: line 50:  9700 Segmentation fault      (core dumped) salmon quant --index ./index --libType U --unmatedReads ./single.fastq --output ./output --allowOrphans --ma 2 --mp 4 --go 5 --ge 3 --minScoreFraction 0.65 --threads "${GALAXY_SLOTS:-4}" --incompatPrior 1e-20 --biasSpeedSamp 1 --fldMax 1000 --fldMean 200 --fldSD 80 --forgettingFactor 0.65 --maxReadOcc 100 --numBiasSamples 2000000 --numAuxModelSamples 5000000 --numPreAuxModelSamples 1000000 --numGibbsSamples 0 --numBootstraps 0 --consensusSlack 0 --vbPrior 0.001 --sigDigits 3
  • syslog
Aug 27 20:27:57 ip-172-31-16-139 kernel: [ 2949.318784] traps: salmon[9700] general protection ip:7fb66057cdca sp:7ffe1bf3a900 error:0 in libtbb.so.2[7fb66055a000+37000]

scottx611x avatar Aug 27 '18 20:08 scottx611x

I can't reproduce this using 0.11.2 on Galaxy (18.05, not that that should matter) with a slurm (17.02.9) cluster. I've tried using both 20 cores and 1 core (in case something weird is going on with the threading) and both run fine. I used our cluster default of 6GB per core, which is overkill for this job. My guess is that the same tbb version is getting used in each version of salmon you're trying and that it got corrupted at some point. Are you spinning up a new CloudMan instance for these runs or are you restarting a saved instance? If you're not starting a brand new instance then try that, then you can avoid using the same possibly corrupted tbb install.

dpryan79 avatar Aug 27 '18 22:08 dpryan79

@dpryan79 Thanks for trying to reproduce, I really appreciate this. We're currently bringing up CloudMan instances derived from shared cluster strings. I'll try to bring up a fresh CloudMan instance and try to see the same behavior that you are.

scottx611x avatar Aug 28 '18 15:08 scottx611x

Did this ever get tracked down? we are having a situation where salmon seems to segfault whenever using slurm (this time it's salmon index that segfaults, though). wondering if you figured out a solution.

nsheff avatar Dec 06 '19 22:12 nsheff

@nsheff Sorry I was never able to dig into this further

scottx611x avatar Dec 09 '19 17:12 scottx611x

Also getting segmentation fault. Any progress on this? This is salmon v1.3.0, installed with conda or using the binary, running in slurm. I do not get a segmentation fault if I pass only a single file, but I do if I pass two files.

$  ./src/salmon-latest_linux_x86_64/bin/salmon quant --threads $(nproc) --libType U -t GRCh38_latest_rna.fa -a data/processed/bwa-mem/SRR10571655.sam data/processed/bwa-mem/SRR10571656.sam -o _tmp/ 
Version Info Exception: server did not respond before timeout
# salmon (alignment-based) v1.3.0
# [ program ] => salmon 
# [ command ] => quant 
# [ threads ] => { 32 }
# [ libType ] => { U }
# [ targets ] => { GRCh38_latest_rna.fa }
# [ alignments ] => { data/processed/bwa-mem/SRR10571655.sam data/processed/bwa-mem/SRR10571656.sam }
# [ output ] => { _tmp/ }
Logs will be written to _tmp/logs
[2020-10-12 16:13:21.969] [jointLog] [info] setting maxHashResizeThreads to 32
[2020-10-12 16:13:21.969] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
Library format { type:single end, relative orientation:none, strandedness:unstranded }
[2020-10-12 16:13:21.969] [jointLog] [info] numQuantThreads = 26
parseThreads = 6
Checking that provided alignment files have consistent headers . . . done
Populating targets from aln = "data/processed/bwa-mem/SRR10571655.sam", fasta = "GRCh38_latest_rna.fa" . . .done
[2020-10-12 16:13:26.979] [jointLog] [info] replaced 5 non-ACGT nucleotides with random nucleotides




processed 103000000 reads in current round[1]    1994 segmentation fault (core dumped)  ./src/salmon-latest_linux_x86_64/bin/salmon quant --threads $(nproc) --libTyp

Always at 103000000 reads.

izaakm avatar Oct 12 '20 20:10 izaakm

Hi @izaakm,

This segfault is unlikely related to the issue here, since that happened in "mapping mode" (salmon performing mapping itself), and yours is happening in alignment-based mode (you're feeding SAM files to salmon). Does it fail to occur when you provide either of the SAM files to salmon? That is, does it run to completion with both data/processed/bwa-mem/SRR10571655.sam and data/processed/bwa-mem/SRR10571656.sam individually? Also, what if you combine them via a pipe (i.e. something like):

./src/salmon-latest_linux_x86_64/bin/salmon quant --threads $(nproc) --libType U -t GRCh38_latest_rna.fa -a <(cat data/processed/bwa-mem/SRR10571655.sam <(samtools view data/processed/bwa-mem/SRR10571656.sam)) -o _tmp/ 

the double redirect is just to make sure the header isn't included in the second sam file. Also, is the reference that you are passing to the -t option identical to the one with which bwa-mem was run? If the problem persists, we might need the sam/bam files to track it down further, since I imagine it may be data-dependent.

--Rob

rob-p avatar Oct 12 '20 20:10 rob-p

It does run with each of the two files separately, but when I try the command with the double redirect I get a message like the one below for many/all[?] of the sequences in the reference and quant.sf is empty (except the header).

[2020-10-12 17:05:47.406] [jointLog] [warning] Transcript XM_024446103.1 appears in the reference but did not appear in the BAM

izaakm avatar Oct 12 '20 21:10 izaakm

That is interesting. The attempt in the double redirect was to include all alignment records from the second sam file simply concatenated to the first. Assuming the SAM files contain the same header, this should be OK (simply another way to treat them as a single input). However this warning suggests that there were references in the file passed to -t that did not have a corresponding entry in the SAM file. Yet, with the redirect, the first sam file should contain the full header. I don't have a clear understanding of why this would happen yet.

rob-p avatar Oct 12 '20 21:10 rob-p