cooler cooler balance drops many interactions

cooler balance drops many interactions

Open conchoecia opened this issue 3 years ago • 2 comments

I have a cool file that appears to be fine - there are reads in pretty much every bin. Here it is as a ginteractions file:

NC_042997.1     0       1000000 NC_042997.1     0       1000000 63334
NC_042997.1     0       1000000 NC_042997.1     1000000 2000000 6563
NC_042997.1     0       1000000 NC_042997.1     2000000 3000000 3641
NC_042997.1     0       1000000 NC_042997.1     3000000 4000000 2913
NC_042997.1     0       1000000 NC_042997.1     4000000 5000000 2443
NC_042997.1     0       1000000 NC_042997.1     5000000 6000000 2192
NC_042997.1     0       1000000 NC_042997.1     6000000 7000000 1757
NC_042997.1     0       1000000 NC_042997.1     7000000 8000000 1460
NC_042997.1     0       1000000 NC_042997.1     8000000 9000000 1311
NC_042997.1     0       1000000 NC_042997.1     9000000 10000000        1348
NC_042997.1     0       1000000 NC_042997.1     10000000        11000000        706
NC_042997.1     0       1000000 NC_042997.1     11000000        12000000        517
NC_042997.1     0       1000000 NC_042997.1     12000000        13000000        485
.
.
.
et cetera

And here's an image after converting to mcool:

When I run cooler balance --force, the output has A LOT of bins dropped. Note how the start of the file is now at 11Mb

NC_042997.1     11000000        12000000        NC_042997.1     11000000        12000000        0.12287012107273232
NC_042997.1     11000000        12000000        NC_042997.1     14000000        15000000        0.009179471469380696
NC_042997.1     11000000        12000000        NC_042997.1     16000000        17000000        0.0022613667508674393
NC_042997.1     11000000        12000000        NC_042997.1     17000000        18000000        0.0017934892086176287
NC_042997.1     11000000        12000000        NC_042997.1     19000000        20000000        0.0015706112594678617
NC_042997.1     11000000        12000000        NC_042997.1     20000000        21000000        0.0016067216658818219
NC_042997.1     11000000        12000000        NC_042997.1     22000000        23000000        0.0026443289772690387
NC_042997.1     11000000        12000000        NC_042997.1     23000000        24000000        0.0017619224122503929
NC_042997.1     11000000        12000000        NC_042997.1     32000000        33000000        0.001725912256081153
NC_042997.1     11000000        12000000        NC_042997.1     33000000        34000000        0.0015621264682953127
NC_042997.1     11000000        12000000        NC_042997.1     39000000        40000000        0.0016796723318025406
NC_042997.1     11000000        12000000        NC_042997.1     40000000        41000000        0.0010349507443594623
NC_042997.1     11000000        12000000        NC_042997.1     42000000        43000000        0.0013117895675163343
NC_042997.1     11000000        12000000        NC_042997.1     43000000        44000000        0.0015743365358010337
NC_042997.1     11000000        12000000        NC_042997.1     44000000        45000000        0.0014398805814632809
NC_042997.1     11000000        12000000        NC_042997.1     49000000        50000000        0.0014603177220181285
NC_042997.1     11000000        12000000        NC_042997.1     51000000        52000000        0.0018351809986778198
.
.
.
et cetera

And here is the balanced matrix after converting to an mcool and visualizing.

Do you have any idea what could be going on? Seems like something is not working the way it should. Thank you!

Sep 12 '20 17:09 conchoecia

cooler balance is trying to ensure convergence of balancing algorithm by filtering out (ignoring) "misbehaving" bins: misbehaving bins are ones that have some sort of coverage "issues" - (1) too little interactions (controlled by --min-count parameter), (2) not enough non-zero pixels (controlled by --min-nnz parameter), (3) coverage of a given bin deviates too much from the rest (controlled by --mad-max parameter). https://cooler.readthedocs.io/en/latest/cli.html#cooler-balance

(1) and (2) are related of course, but --min-nnz allows one to avoid extreme situations where e.g. there are handful of super-bright pixels (i.e. --min-count is satisfied), but all others zeroes - it does not look like you have that problem - by looking at your raw heatmap

(3) - maybe tricky to understand at first but seems like exactly what you'd need to adjust. This filter first calculates , sort of "average bin coverage" per chromosome (median to be exact), and then it check if a coverage of an individual bin is deviating too much from the "average" . The "too much" in this context is measure in MADs - median absolute deviations - aka median deviation from the median (argh ...). Anyhow, the default --mad-max 5 is perhaps too stringent for your data - you could try something like --mad-max 10 , or more ...

Also you might want to explore a bit further why the coverage in your data has such a wide distribution ? Is there a biological reason for that ? What organism is this ?

PS. words "coverage" and "marginals" are used interchangeably in this context. And they are roughly sum of interactions along the row(column) of the heatmap .

PPS there is some code for calculating raw coverage from a binned cooler https://github.com/mirnylab/cooltools/blob/master/cooltools/coverage.py if you wish to explore that further ... Or you can just calculate sums of rows in a raw heatmap if the data is small enough to fit in memory

Sep 12 '20 19:09 sergpolly

Thanks for your response, Sergey. This is a few Hi-C libraries on SRA from Octopus sinensis. No one has published anything about AB-compartments or TADs in spiralians before, so I was taking a look. It could be that the library has too many short inserts and the log decay as distance increases is very rapid. That would explain the wide distribution of coverage.

Edit/Update:

After plotting the z-score of the bins, this looks unlike any dataset that I've ever seen. I think this problem can be fixed by increasing the bin size.

renamed 1000000 dist

Sep 12 '20 23:09 conchoecia

Marking as resolved. Please re-open if you are still encountering issues.

Jan 24 '24 16:01 nvictus

cooler cooler copied to clipboard

cooler balance drops many interactions

cooler
cooler copied to clipboard