minimap2
minimap2 copied to clipboard
splice sites in junc-bed file to override default settings
I think the information in the junc-bed file can be better utilized by minimap2 in dealing with cases that deviate from the default settings. Two such cases:
- When there are non-consensus splice junctions in the junc-bed file, minimap2 should be able to use those instead of introducing small indels to generate the alignment with consensus splice sites.
- When there is an intron that is >200kb (the default max for intron length) in the junc-bed file, minimap2 should use that information to generate an alignment with a large intron.
A couple of specific examples to demonstrate this:
The splice junctions file file and the query fasta file are attached.
Chromosome sequence can be downloaded from NCBI FTP path as shown below:
curl -O "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_assembly_structure/Primary_Assembly/assembled_chromosomes/FASTA/chr2.fna.gz"
minimap2 was executed as follows:
~/bin/minimap2 -ax splice -C 5 --eqx --MD --cs --junc-bed splice_junctions.bed.gz chr2.fna.gz query.fa.gz > aligns.sam
The query gnl|SRA|SRR1803611.121425.1
is expected to align to the subject with non-consensus splice sites. These are in the splice_junctions.bed
file. However, minimap2 aligns this query with consensus splice sites by introducing a 3 nt deletion.
The query gnl|SRA|SRR1803617.262344.1
is expected to align to the subject with an intron >200kb which, again, is in the splice_junctions.bed
file. However, minimap2 aligns this query with a 570nt unaligned tail.
Thanks. Very good suggestions. I will consider this.
Thanks. Very good suggestions. I will consider this.
Much appreciated. I'd be happy to provide additional examples, and help with review/testing if you need.