cooler
cooler copied to clipboard
metadata for cooler balance is hard to find
currently, the parameters of balancing are stored as a metadata of a particular column of the bins table, which is really hard to find. While this is technically valid, on practice, metadata of particular columns of particular datasets are really hard to discover (for balance, it doesn't seem to be documented, or, documenation is hard to find). It would be really useful if at least cooler info could print all metadata of all columns.
On a related topic - is it even possible to quickly check if a given .cool is balanced at all or not, via CLI?
cooler info does not show it ...
Right now people in the lab check if there is a e.g. weights-column in clr.bins()[:10] ...
Maybe some hdf5 CLI tools that allow to sneak into a given cooler-uri to see table headers or something like that?
I'll leave it here, just in case: h5dump -n filename.cool yields HDF5 file content, e.g.:
HDF5 "blahblah_hg19.10000.cool" {
FILE_CONTENTS {
group /
group /bins
dataset /bins/chrom
dataset /bins/end
dataset /bins/start
dataset /bins/weight
group /chroms
dataset /chroms/length
dataset /chroms/name
group /indexes
dataset /indexes/bin1_offset
dataset /indexes/chrom_offset
group /pixels
dataset /pixels/bin1_id
dataset /pixels/bin2_id
dataset /pixels/count
}
}
if dataset /bins/weight is present - balancing was attempted, at the very least.
@sergpolly, there is cooler balance --check
oh my! that's handy! - I wish I read docs more carefully.
But this one would only look for weight column in bins, right?
It would return False in case balancing weights are in a column that is named something else, wouldn't it ?
As long as you know the name of the column, you can provide it as the --name parameter.
See new cooler tree and cooler attrs commands in 0.8
could we have something similar in Python API, pretty please? Again, balancing parameters are just impossible to discover without googling