reg-gen icon indicating copy to clipboard operation
reg-gen copied to clipboard

Re #127 AttributeError: 'GenomicSignal' object has no attribute 'bam'

Open ceegeeCode opened this issue 4 years ago • 6 comments

Hi, I wonder what the solution to #127 was or if it's supposed to be taken care of by now and done. I'm getting the same type of error running a command like this (installed HINT a few days ago):

rgt-hint tracks --bc --bigWig --organism=hg19 xx.bam xxx.narrowPeak.sorted --output-location=$HINTOUT --output-prefix=track_$PEAKCALLER.REF_chr$nr Traceback (most recent call last): File "/home/cbmr/tkj375/.local/bin/rgt-hint", line 8, in sys.exit(main()) File "/home/cbmr/tkj375/.local/lib/python3.6/site-packages/rgt/HINT/Main.py", line 95, in main args.func(args) File "/home/cbmr/tkj375/.local/lib/python3.6/site-packages/rgt/HINT/Tracks.py", line 59, in tracks_run get_bc_tracks(args) File "/home/cbmr/tkj375/.local/lib/python3.6/site-packages/rgt/HINT/Tracks.py", line 190, in get_bc_tracks min_length=None, max_length=None, strand=False) File "/home/cbmr/tkj375/.local/lib/python3.6/site-packages/rgt/HINT/signalProcessing.py", line 911, in get_bc_signal_by_fragment_length for read in self.bam.fetch(ref, p1, p2): AttributeError: 'GenomicSignal' object has no attribute 'bam'

This only occurs for some of my data, which are grouped by chromosome (human, hg19); I don't get the error for eg chromosome 3,5,12 but for 1,6,7.

Kind regards, Christian G

ceegeeCode avatar Sep 01 '20 08:09 ceegeeCode

Hi again,

I traced this back to a dependency on the start of the first region ("interesting genomic region") returned from the peak-calling (ie first record of the sorted narrowPeak file). If the start is 0 (or just <= window it seems) the code will enter a an if-clause where self.bam... is called; it's line 890-925 in signalProcessing.py:

" def get_bc_signal_by_fragment_length(self, ref, start, end, bam, fasta, bias_table, forward_shift, reverse_shift, min_length=None, max_length=None, strand=True): # Parameters window = 50 defaultKmerValue = 1.0

    # Initialization
    fBiasDict = bias_table[0]
    rBiasDict = bias_table[1]
    k_nb = len(list(fBiasDict.keys())[0])
    p1 = start
    p2 = end
    p1_w = p1 - (window // 2)
    p2_w = p2 + (window // 2)
    p1_wk = p1_w - int(k_nb / 2.)
    p2_wk = p2_w + int(k_nb / 2.)

    if (p1 <= 0 or p1_w <= 0 or p2_wk <= 0):
        # Return raw counts
        signal = [0.0] * (p2 - p1)
        for read in self.bam.fetch(ref, p1, p2):
            # check if the read is unmapped, according to issue #112
            if read.is_unmapped:
                continue

            if not read.is_reverse:
                cut_site = read.pos + forward_shift
                if p1 <= cut_site < p2:
                    signal[cut_site - p1] += 1.0
            else:
                cut_site = read.aend + reverse_shift - 1
                if p1 <= cut_site < p2:
                    signal[cut_site - p1] += 1.0

        return signal

"

When out-commenting the loop that starts "for read in self.bam.fetch(ref, p1, p2):" the code runs (ie the error is seemingly avoided). But maybe the result is not quite as what was intended (though I guess this only hits this very first "interesting genomic region").

Best, Christian

ceegeeCode avatar Sep 01 '20 09:09 ceegeeCode

Nitty-gritty detail, but I meant: "If the start is 0 (or just <= window/2. it seems)"

ceegeeCode avatar Sep 01 '20 09:09 ceegeeCode

Hi @ceegeeCode

Thanks for reporting this error.

The reason why we use if-clause is if the start of a region is out of chromosome size, it's not possible to do bias correction, because we can not retrieval a 8-mer from the bias table. So in this case, I decided to use the raw signal.

we will fix this in the next release.

Best, Li

lzj1769 avatar Sep 01 '20 10:09 lzj1769

Hi Li,

thanks for the swift answer! If I understand it, just setting the signal to 0 for these start regions (what I did by out-commenting that loop) should only be of little harm.

Best regards, Christian

ceegeeCode avatar Sep 01 '20 13:09 ceegeeCode

Hi Christian,

thanks for the swift answer! If I understand it, just setting the signal to 0 for these start regions (what I did by out-commenting that loop) should only be of little harm.

exactly, usually, these regions are pretty rare and they shouldn't have an impact on signal generation. if you feel something wrong with your results, just let me know so I can test it, :)

Best, Li

lzj1769 avatar Sep 01 '20 13:09 lzj1769