HiCcompare icon indicating copy to clipboard operation
HiCcompare copied to clipboard

No MCC values in the filter_param plot

Open ashishjain1988 opened this issue 2 years ago • 9 comments

Hi,

I am trying using the filter_params function to select the optimum A.min values for filtering. We are interested in contacts on chromomse 4. When I check the plot, it seems to not have the MCC values for A values (approx from 2 to 8). Is there an reason for the package to not able to calculate the MCC values? Here is the plot that I got for chromosome 4.

Screenshot 2024-01-23 at 11 30 52 AM

ashishjain1988 avatar Feb 12 '24 17:02 ashishjain1988

Hi @ashishjain1988 , it is hard to tell why some MCC values are missing. I won't be concerned about it. More important is to find acceptable True and Falce positive rate cutoffs. I'd be conservative and pick 10 but 7-8 is also OK. We already discussed that HiCcompare is robust to the choice of A https://github.com/dozmorovlab/HiCcompare/issues/29#issuecomment-1535572871 because small differences are unlikely to be detected as statistically significant. I'll keep an eye on missing MCC values and debug when have an example.

mdozmorov avatar Feb 13 '24 00:02 mdozmorov

Hi @mdozmorov , thank you for your response. This data is more deeply sequenced than the previois one. One thing I want to ask is the TPR and FPR. Based on this plot it seems like the False Positive rate is way higher than the true positive rate at A.min=10. Is still that a good threshold? Also, the default threshold of 2 is not giving us any significant contacts.

ashishjain1988 avatar Feb 13 '24 16:02 ashishjain1988

I overlooked the curves are inverted, this is indeed confusing. Here's the explanation from my student, @hamy12398:

Their plot can happen since it can depend on number of changed they set. (ex above, I set numberChanges to 30). Since MCC is based from products of different sum pairs of TP, TN, FP, FN in their denominator in their fraction function, so by some chance if this denominator = 0, it can cause MCC to be undefined. image

What are the parameters you used for filter_params()? Can you try with numChanges = 30?

mdozmorov avatar Feb 13 '24 20:02 mdozmorov

I was actually carrying out the analysis using 25kbp resolution and as mentioned in the manual i proportionally increased the numChanges to 2500 (filter_params(hic.list[[i]],numChanges = 2500)). Is that too much for 25kbp resolution? I will try out the numChanges = 30 too. Thanks!

ashishjain1988 avatar Feb 14 '24 15:02 ashishjain1988

Below is the plot I got using the filter_params function for chromosome 4. The resolution I used is 25kbp and numChanges = 30. It seems like the all the results are FPR image

ashishjain1988 avatar Feb 20 '24 14:02 ashishjain1988

It is hard to tell without seeing the data. Have you tried to visualize single matrices? It may be the data is very sparse at 25k resolution.

mdozmorov avatar Feb 20 '24 16:02 mdozmorov

This is how the contact data looks like for individual samples. The scale is log2. image image

ashishjain1988 avatar Feb 20 '24 16:02 ashishjain1988

The data looks good. I still cannot say why your A plot looks strange. Try debugging of the actual function. Again, A threshold is not that critical, I would explore the MD plot, call differential interactions and visualize them.

mdozmorov avatar Feb 20 '24 17:02 mdozmorov

Thanks! I will look into that.

ashishjain1988 avatar Feb 20 '24 17:02 ashishjain1988