bamsnap icon indicating copy to clipboard operation
bamsnap copied to clipboard

Cant manage to get working with alternative FASTA

Open ebioman opened this issue 4 years ago • 6 comments

I was eager to try your tool and the examples in the repository work well. I tried directly plotting from your examples as well as re-alignment (minimap2) on hg38 with subsequent plotting. In that case both without and with provided fasta sequence worked.

Whenever I try though to use a non-human reference genome, I get an error :

2021-01-18 10:51:57,140 : [INFO] Total running time: 0.0 sec
Process proc 1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/site-packages/bamsnap/bamsnap.py", line 227, in run_process_drawplot_bamlist
    refseq = rseq.get_refseq(pos1)
  File "/usr/local/lib/python3.7/site-packages/bamsnap/bamsnap.py", line 537, in get_refseq
    refseq = self.get_refseq_from_localfasta(pos1)
  File "/usr/local/lib/python3.7/site-packages/bamsnap/bamsnap.py", line 586, in get_refseq_from_localfasta
    refseq[gpos+1] = seq[i]
IndexError: string index out of range

In order to assure that is should work I took really headers such as ">sequence1",">sequence2" and so on and verified that reads mapped actually on provided positions but it always gave the same error. Any hints how to fix that, or plans to add to your test repository some user reference sequences to make it easier to understand what is the proper syntax in that case ?

ebioman avatar Jan 18 '21 11:01 ebioman

Can I second this issue?

I want to use this to visualize reads of SARS-CoV-2 with reference MN908947.3

The command:

$ bamsnap -ref MN908947.3.fasta -bam test.primertrim.sorted.bam -pos MN908947.3:241 -out test.bamsnap.png 

The error:

2021-01-19 11:16:15,958 : [INFO] Total running time: 0.0 sec
Process proc 1:
Traceback (most recent call last):
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 227, in run_process_drawplot_bamlist
    refseq = rseq.get_refseq(pos1)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 537, in get_refseq
    refseq = self.get_refseq_from_localfasta(pos1)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 586, in get_refseq_from_localfasta
    refseq[gpos+1] = seq[i]
IndexError: string index out of range

erinyoung avatar Jan 19 '21 18:01 erinyoung

Looking at the code, this part seems to be source of the error:

    def get_refseq_from_localfasta(self, pos1):
        spos = pos1['g_spos']-self.opt['margin'] - 500
        epos = pos1['g_epos']+self.opt['margin'] + 1 + 500
        seq = self.get_refseq_from_fasta(pos1['chrom'], spos, epos, self.opt['ref_index_rebuild'])
        i = 0
        refseq = {}
        for gpos in range(spos, epos):
            refseq[gpos+1] = seq[i]
            i += 1
        return refseq

Where we can see already that start and end-position take the provided position and shift by 500bp + the given margin (default 50bp) in order to find anything to plot at all. @erinyoung Might that be the issue in your case, that since you have the position 241bp that start would be 241 - 500 -50bp = -309bp which would be out of range ?

Will have to go back to mine and see whether that applies - certainly not for all which I tested.....

ebioman avatar Jan 21 '21 07:01 ebioman

This still did not work:

$ bamsnap -ref MN908947.3.fasta -bam test.sorted.bam -pos MN908947.3:1181 -out test.bamsnap.png
 
2021-02-08 15:47:54,183 : [INFO] Total running time: 0.0 sec
Process proc 1:
Traceback (most recent call last):
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 229, in run_process_drawplot_bamlist
    imagefname = bsplot.drawplot_bamlist(pos1, image_w, bamlist, xscale, refseq)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 475, in drawplot_bamlist
    ia = self.append_geneplot_image(ia, pos1, image_w, xscale)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 429, in append_geneplot_image
    geneplot = GenePlot(pos1['chrom'], pos1['g_spos'], pos1['g_epos'], xscale,
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/geneplot.py", line 101, in __init__
    self.load_gene_structure()
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/geneplot.py", line 114, in load_gene_structure
    for rec in self.gene_annot_tb.querys(pos_str):
tabix.TabixError: query failed

I was reading https://github.com/parklab/bamsnap/issues/11#issue-785047461 and finally got bamsnap working for me:

bamsnap -draw coordinates bamplot coverage base \
     -ref MN908947.3.fasta -bam test.sorted.bam -pos MN908947.3:1181 -out test.bamsnap.png

I guess I just need to be wary of the edges.

erinyoung avatar Feb 08 '21 23:02 erinyoung

Encountered similar issues when drawing short chromosomes and also near chromosome edges. I made some edits in a branch in a fork. (branch cli_margin)

An example synthetic fasta sequence and synthetic alignment:

>random_small
GTATGATACCTAGATAATAAAGGCTGCTAGGAGCCCTTTAACAGAGGACCTATACGTAAG
GCACATGAAAAATGAGTTAGCGGCAGACCTATGTTGTAAGCTAGCTCGCGGAGTAAACGT
@SQ	SN:random_small	LN:120
@PG	ID:bwa	PN:bwa	VN:0.7.17-r1188	CL:bwa mem -o example.sam small.fa small.fastq
random_small-1	0	random_small	1	60	60M	*	0	0	GTATGATACCTAGATAATAAAGGCTGCTAGGAGCCCTTTAACAGAGGACCTATACGTAAG	ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss	NM:i:0MD:Z:60	AS:i:60	XS:i:0
random_small-2	0	random_small	61	60	60M	*	0	0	GCACATGAAAAATGAGTTAGCGGCAGACCTATGTTGTAAGCTAGCTCGCGGAGTAAACGT	ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss	NM:i:0MD:Z:60	AS:i:60	XS:i:0

The current version gives IndexError on this data. In the edits, setting -pos random_small:1-120 (entire chromosome) and -margin 0, it becomes possible to draw the chromosome edge-to-edge.

entire-chromosome

When -margin is set to nonzero and would cause the genomic region off the chromosome boundaries, the alignment becomes left-justified and leaves empty space on the right.

margin-outside

The examples on readthedocs also seem to generate images (sometimes with one base different on the right). Does this seem reasonable?

@danielmsk - I'd be happy to send a pull request with this. There is potential these changes may break other things that I may not be aware of. It would be great to get your input.

tkonopka avatar Feb 16 '21 20:02 tkonopka

This is still an issue and really undermines bamsnap scope and usefulness.

gtrichard avatar Sep 01 '22 16:09 gtrichard

We added 1000 to the start and subtracted 1000 from the end position and I really really hate that this worked.

lskatz avatar Sep 06 '22 14:09 lskatz