bamsnap
bamsnap copied to clipboard
Cant manage to get working with alternative FASTA
I was eager to try your tool and the examples in the repository work well. I tried directly plotting from your examples as well as re-alignment (minimap2) on hg38 with subsequent plotting. In that case both without and with provided fasta sequence worked.
Whenever I try though to use a non-human reference genome, I get an error :
2021-01-18 10:51:57,140 : [INFO] Total running time: 0.0 sec
Process proc 1:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/site-packages/bamsnap/bamsnap.py", line 227, in run_process_drawplot_bamlist
refseq = rseq.get_refseq(pos1)
File "/usr/local/lib/python3.7/site-packages/bamsnap/bamsnap.py", line 537, in get_refseq
refseq = self.get_refseq_from_localfasta(pos1)
File "/usr/local/lib/python3.7/site-packages/bamsnap/bamsnap.py", line 586, in get_refseq_from_localfasta
refseq[gpos+1] = seq[i]
IndexError: string index out of range
In order to assure that is should work I took really headers such as ">sequence1",">sequence2" and so on and verified that reads mapped actually on provided positions but it always gave the same error. Any hints how to fix that, or plans to add to your test repository some user reference sequences to make it easier to understand what is the proper syntax in that case ?
Can I second this issue?
I want to use this to visualize reads of SARS-CoV-2 with reference MN908947.3
The command:
$ bamsnap -ref MN908947.3.fasta -bam test.primertrim.sorted.bam -pos MN908947.3:241 -out test.bamsnap.png
The error:
2021-01-19 11:16:15,958 : [INFO] Total running time: 0.0 sec
Process proc 1:
Traceback (most recent call last):
File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 227, in run_process_drawplot_bamlist
refseq = rseq.get_refseq(pos1)
File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 537, in get_refseq
refseq = self.get_refseq_from_localfasta(pos1)
File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 586, in get_refseq_from_localfasta
refseq[gpos+1] = seq[i]
IndexError: string index out of range
Looking at the code, this part seems to be source of the error:
def get_refseq_from_localfasta(self, pos1):
spos = pos1['g_spos']-self.opt['margin'] - 500
epos = pos1['g_epos']+self.opt['margin'] + 1 + 500
seq = self.get_refseq_from_fasta(pos1['chrom'], spos, epos, self.opt['ref_index_rebuild'])
i = 0
refseq = {}
for gpos in range(spos, epos):
refseq[gpos+1] = seq[i]
i += 1
return refseq
Where we can see already that start and end-position take the provided position and shift by 500bp + the given margin (default 50bp) in order to find anything to plot at all. @erinyoung Might that be the issue in your case, that since you have the position 241bp that start would be 241 - 500 -50bp = -309bp which would be out of range ?
Will have to go back to mine and see whether that applies - certainly not for all which I tested.....
This still did not work:
$ bamsnap -ref MN908947.3.fasta -bam test.sorted.bam -pos MN908947.3:1181 -out test.bamsnap.png
2021-02-08 15:47:54,183 : [INFO] Total running time: 0.0 sec
Process proc 1:
Traceback (most recent call last):
File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 229, in run_process_drawplot_bamlist
imagefname = bsplot.drawplot_bamlist(pos1, image_w, bamlist, xscale, refseq)
File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 475, in drawplot_bamlist
ia = self.append_geneplot_image(ia, pos1, image_w, xscale)
File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 429, in append_geneplot_image
geneplot = GenePlot(pos1['chrom'], pos1['g_spos'], pos1['g_epos'], xscale,
File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/geneplot.py", line 101, in __init__
self.load_gene_structure()
File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/geneplot.py", line 114, in load_gene_structure
for rec in self.gene_annot_tb.querys(pos_str):
tabix.TabixError: query failed
I was reading https://github.com/parklab/bamsnap/issues/11#issue-785047461 and finally got bamsnap working for me:
bamsnap -draw coordinates bamplot coverage base \
-ref MN908947.3.fasta -bam test.sorted.bam -pos MN908947.3:1181 -out test.bamsnap.png
I guess I just need to be wary of the edges.
Encountered similar issues when drawing short chromosomes and also near chromosome edges. I made some edits in a branch in a fork. (branch cli_margin)
An example synthetic fasta sequence and synthetic alignment:
>random_small
GTATGATACCTAGATAATAAAGGCTGCTAGGAGCCCTTTAACAGAGGACCTATACGTAAG
GCACATGAAAAATGAGTTAGCGGCAGACCTATGTTGTAAGCTAGCTCGCGGAGTAAACGT
@SQ SN:random_small LN:120
@PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -o example.sam small.fa small.fastq
random_small-1 0 random_small 1 60 60M * 0 0 GTATGATACCTAGATAATAAAGGCTGCTAGGAGCCCTTTAACAGAGGACCTATACGTAAG ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss NM:i:0MD:Z:60 AS:i:60 XS:i:0
random_small-2 0 random_small 61 60 60M * 0 0 GCACATGAAAAATGAGTTAGCGGCAGACCTATGTTGTAAGCTAGCTCGCGGAGTAAACGT ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss NM:i:0MD:Z:60 AS:i:60 XS:i:0
The current version gives IndexError on this data. In the edits, setting -pos random_small:1-120
(entire chromosome) and -margin 0
, it becomes possible to draw the chromosome edge-to-edge.
When -margin
is set to nonzero and would cause the genomic region off the chromosome boundaries, the alignment becomes left-justified and leaves empty space on the right.
The examples on readthedocs also seem to generate images (sometimes with one base different on the right). Does this seem reasonable?
@danielmsk - I'd be happy to send a pull request with this. There is potential these changes may break other things that I may not be aware of. It would be great to get your input.
This is still an issue and really undermines bamsnap scope and usefulness.
We added 1000 to the start and subtracted 1000 from the end position and I really really hate that this worked.