generateDecoyTranscriptome.sh gets 21 killed
I've made a docker container for SalmonTools https://quay.io/repository/comp-bio-aging/salmon-tools However, I constantly get:
/opt/SalmonTools/scripts/generateDecoyTranscriptome.sh: line 105: 21 Killed $mashmap -r reference.masked.genome.fa -q $txpfile -t $threads --pi 80 -s 500
I run it on 32 cores machine with 64 GB RAM and I use Ensembl human genome. I think something may be wrong in the bash script itself
/opt/SalmonTools/scripts/generateDecoyTranscriptome.sh: line 105: 21 Killed $mashmap -r reference.masked.genome.fa -q $txpfile -t $threads --pi 80 -s 500
***************
*** ABORTED ***
***************
An error occurred. Exiting...
the command is:
/opt/SalmonTools/scripts/generateDecoyTranscriptome.sh -a /cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.96.gtf -g /cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.dna.primary_assembly.fa -t /cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.cdna.all.fa -j 16 -o output
the stdout file is:
*** getDecoy ***
****************
-a <Annotation GTF file> = /cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.96.gtf
-g <Genome fasta> = /cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.dna.primary_assembly.fa
-t <Transcriptome fasta> = /cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.cdna.all.fa
-j <Concurrency level> = 16
-o <Output files Path> = output
[1/10] Extracting exonic features from the gtf
[2/10] Masking the genome fasta
[3/10] Aligning transcriptome to genome
>>>>>>>>>>>>>>>>>>
Reference = [reference.masked.genome.fa]
Query = [/cromwell-executions/decoy/9f2ca769-5a26-4149-a40c-ecc606e9b76c/call-generate/inputs/-848260311/Homo_sapiens.GRCh38.cdna.all.fa]
Kmer size = 16
Window size = 5
Segment length = 500 (read split allowed)
Alphabet = DNA
Percentage identity threshold = 80%
Mapping output file = mashmap.out
Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
Execution threads = 16
>>>>>>>>>>>>>>>>>>
INFO, skch::Sketch::build, minimizers picked from reference = 938129647
I think it's related to https://github.com/COMBINE-lab/SalmonTools/issues/5. The problem is memory usage, I think . We've raised the issue on mashmap's repo here.
I have 64GB RAM, is it not enough? Also, why did you choose mashmap, it has not been updates for a year. Why not minimap2 which is fast, eats less memory and good for both short and long reads?