reg-gen
reg-gen copied to clipboard
Re #127 AttributeError: 'GenomicSignal' object has no attribute 'bam'
Hi, I wonder what the solution to #127 was or if it's supposed to be taken care of by now and done. I'm getting the same type of error running a command like this (installed HINT a few days ago):
rgt-hint tracks --bc --bigWig --organism=hg19 xx.bam xxx.narrowPeak.sorted --output-location=$HINTOUT --output-prefix=track_$PEAKCALLER.REF_chr$nr Traceback (most recent call last): File "/home/cbmr/tkj375/.local/bin/rgt-hint", line 8, in sys.exit(main()) File "/home/cbmr/tkj375/.local/lib/python3.6/site-packages/rgt/HINT/Main.py", line 95, in main args.func(args) File "/home/cbmr/tkj375/.local/lib/python3.6/site-packages/rgt/HINT/Tracks.py", line 59, in tracks_run get_bc_tracks(args) File "/home/cbmr/tkj375/.local/lib/python3.6/site-packages/rgt/HINT/Tracks.py", line 190, in get_bc_tracks min_length=None, max_length=None, strand=False) File "/home/cbmr/tkj375/.local/lib/python3.6/site-packages/rgt/HINT/signalProcessing.py", line 911, in get_bc_signal_by_fragment_length for read in self.bam.fetch(ref, p1, p2): AttributeError: 'GenomicSignal' object has no attribute 'bam'
This only occurs for some of my data, which are grouped by chromosome (human, hg19); I don't get the error for eg chromosome 3,5,12 but for 1,6,7.
Kind regards, Christian G
Hi again,
I traced this back to a dependency on the start of the first region ("interesting genomic region") returned from the peak-calling (ie first record of the sorted narrowPeak file). If the start is 0 (or just <= window it seems) the code will enter a an if-clause where self.bam... is called; it's line 890-925 in signalProcessing.py:
" def get_bc_signal_by_fragment_length(self, ref, start, end, bam, fasta, bias_table, forward_shift, reverse_shift, min_length=None, max_length=None, strand=True): # Parameters window = 50 defaultKmerValue = 1.0
# Initialization
fBiasDict = bias_table[0]
rBiasDict = bias_table[1]
k_nb = len(list(fBiasDict.keys())[0])
p1 = start
p2 = end
p1_w = p1 - (window // 2)
p2_w = p2 + (window // 2)
p1_wk = p1_w - int(k_nb / 2.)
p2_wk = p2_w + int(k_nb / 2.)
if (p1 <= 0 or p1_w <= 0 or p2_wk <= 0):
# Return raw counts
signal = [0.0] * (p2 - p1)
for read in self.bam.fetch(ref, p1, p2):
# check if the read is unmapped, according to issue #112
if read.is_unmapped:
continue
if not read.is_reverse:
cut_site = read.pos + forward_shift
if p1 <= cut_site < p2:
signal[cut_site - p1] += 1.0
else:
cut_site = read.aend + reverse_shift - 1
if p1 <= cut_site < p2:
signal[cut_site - p1] += 1.0
return signal
"
When out-commenting the loop that starts "for read in self.bam.fetch(ref, p1, p2):" the code runs (ie the error is seemingly avoided). But maybe the result is not quite as what was intended (though I guess this only hits this very first "interesting genomic region").
Best, Christian
Nitty-gritty detail, but I meant: "If the start is 0 (or just <= window/2. it seems)"
Hi @ceegeeCode
Thanks for reporting this error.
The reason why we use if-clause is if the start of a region is out of chromosome size, it's not possible to do bias correction, because we can not retrieval a 8-mer from the bias table. So in this case, I decided to use the raw signal.
we will fix this in the next release.
Best, Li
Hi Li,
thanks for the swift answer! If I understand it, just setting the signal to 0 for these start regions (what I did by out-commenting that loop) should only be of little harm.
Best regards, Christian
Hi Christian,
thanks for the swift answer! If I understand it, just setting the signal to 0 for these start regions (what I did by out-commenting that loop) should only be of little harm.
exactly, usually, these regions are pretty rare and they shouldn't have an impact on signal generation. if you feel something wrong with your results, just let me know so I can test it, :)
Best, Li