pyBigWig icon indicating copy to clipboard operation
pyBigWig copied to clipboard

BigBed interval problem

Open jtd032 opened this issue 3 years ago • 7 comments

I am creating a list of histograms, one for each file using below code:

Imports

import numpy as np import pyBigWig as bw import matplotlib.pyplot as plt import os

For Loop

directory = 'listed file path' for filename in os.listdir(directory): f = os.path.join(directory, filename) if os.path.isfile(f) and filename.endswith('.bb'): fp = bw.open(f,'r') chr = filename.replace('.bb','') max = fp.header()['maxVal'] #print(fp.header()) a = np.array(fp.entries(chr, 1, max),dtype=np.int64) plt.hist(a[:,2], bins='auto') # arguments are passed to np.histogram plt.title("Histogram with 'auto' bins") #Text(0.5, 1.0, "Histogram with 'auto' bins") print(chr) plt.show()

The problem I am riunning into is retreval of the maxVal from the Header command, it works for the first few graphs but ends up spitting out an error at later files: (int() argument must be a string, a bytes-like object or a number, not 'NoneType') am I understanding that the maxVal is the top end of the range of values for that file?

jtd032 avatar Dec 03 '21 18:12 jtd032

The maxVal is stored in the bigBed header. Could it be that it simply wasn't set for one of the files?

dpryan79 avatar Dec 06 '21 15:12 dpryan79

all files pull up a maxVal when tested chr10 was successful but chr11 was not: image error msg: image

jtd032 avatar Dec 08 '21 18:12 jtd032

Can you make the file available to me? I can have a look then.

dpryan79 avatar Dec 16 '21 23:12 dpryan79

Hi, Currently, I'd like to know how to save the all entries into a file. Here is my code: bb=pyBigWig.open('./PBMCs_HistoneMarks_Blueprint/Males_UMCG00025_H3K4me1.peak_calls.bigBed' ) bb.entries('chrX', 16426, 156000962, withString=False) So how can I output "bb.entries" object? By the way,for the bigBed object, how can I output all chromosomes intervals at one time, I found I need to specify start and end positions for each chromosome. Again,if I use bigWig file, the intervals I extract is same as bigBed?Because I found start and end position is not necessary for bigWig file based on your description. Many thx!

YunfengLUMC avatar Jan 23 '22 20:01 YunfengLUMC

I don't know that I ever put in the logic in the .entries() function to have it fill in the chromosome bounds if nothing was supplied. I suppose that could be done, though since the python function is really just a thin wrapper over a C function and C is less flexible about such things.

For outputting the results of bb.entries(), it's just a list of tuples, so something like the following would work:

for res in bb.entries('chr1', 10000000, 10020000):
    o.write("chr1\t{}\t{}\t{}\n".format(res[0], res[1], res[2]))

dpryan79 avatar Jan 24 '22 13:01 dpryan79

I don't know that I ever put in the logic in the .entries() function to have it fill in the chromosome bounds if nothing was supplied. I suppose that could be done, though since the python function is really just a thin wrapper over a C function and C is less flexible about such things.

Thanks for your detailed reply. Could I try this, I don't need strings: for res in bb.entries('chr1', 10000000, 10020000, withString=False): o.write("chr1".format(res[0], res[1], res[2])) Best wishes!

YunfengLUMC avatar Jan 24 '22 13:01 YunfengLUMC

o.write("chr1\t{}\t{}\n".format(res[0], res[1])) in that case as an example.

dpryan79 avatar Jan 24 '22 14:01 dpryan79