MACS Better cutoff levels control for bdgcallpeak cutoff-analysis

Better cutoff levels control for bdgcallpeak cutoff-analysis

Open callum-b opened this issue 1 year ago • 1 comments

I'm running bdgcallpeak on a bedGraph file optained from an input normalised bigwig file (operation used was subtraction). The minimum value in the file is approximately -5000, and the maximum is approx 1600. Therefore, when running bdgcallpeak with cutoff-analysis to see how many peaks different thresholds yield, I get this output:

score	npeaks	lpeaks	avelpeak
1628.50	2	400	200.00
1537.51	3	600	200.00
1446.52	3	600	200.00
1355.53	6	1200	200.00
1264.54	7	1400	200.00
1173.55	8	1600	200.00
1082.56	8	1600	200.00
991.57	8	1600	200.00
900.58	8	1650	206.25
809.59	8	1800	225.00
718.60	8	1850	231.25
627.62	8	1950	243.75
536.63	12	2950	245.83
445.64	21	4800	228.57
354.65	64	13800	215.62
263.66	239	50350	210.67
172.67	1264	267400	211.55
81.68	8335	1857700	222.88
-9.31	8536	2980185963	349131.44
-100.30	497	1609818765	3239071.96
-191.29	322	1493852546	4639293.62
-282.28	250	1375339296	5501357.18
-373.27	241	1276799008	5297921.20
-464.26	215	1091708608	5077714.46
-555.24	203	962565827	4741703.58
-646.23	173	834599219	4824272.94
-737.22	157	833794369	5310792.16
-828.21	126	799328119	6343873.96
-919.20	103	799320255	7760390.83
-1010.19	89	799323455	8981162.42
-1101.18	84	787521955	9375261.37
-1192.17	72	787524255	10937836.88
-1283.16	55	742078594	13492338.07
-1374.15	51	742079994	14550588.12
-1465.14	42	606310444	14435962.95
-1556.13	33	465246139	14098367.85
-1647.12	32	465246489	14538952.78
-1738.11	32	465247089	14538971.53
-1829.09	31	465247139	15007972.23
-1920.08	26	465243889	17893995.73
-2011.07	23	465244139	20228006.04
-2102.06	24	465244389	19385182.88
-2193.05	23	465244639	20228027.78
-2284.04	22	465244839	21147492.68
-2375.03	22	465245589	21147526.77
-2466.02	22	465245989	21147544.95
-2557.01	19	465245439	24486602.05
-2648.00	16	465245639	29077852.44
-2738.99	15	465245689	31016379.27
-2829.98	13	333369089	25643776.08
-2920.97	13	333369089	25643776.08
-3011.96	13	333369089	25643776.08
-3102.95	12	333369339	27780778.25
-3193.94	12	333369439	27780786.58
-3284.92	11	333369489	30306317.18
-3375.91	10	333310289	33331028.90
-3466.90	10	333310289	33331028.90
-3557.89	8	124043389	15505423.62
-3648.88	8	124043389	15505423.62
-3739.87	6	123766200	20627700.00
-3830.86	6	123766250	20627708.33
-3921.85	6	123766250	20627708.33
-4012.84	6	123766300	20627716.67
-4103.83	6	123766300	20627716.67
-4194.82	6	123766350	20627725.00
-4285.81	5	123765950	24753190.00
-4376.80	5	123765950	24753190.00
-4467.79	5	123766000	24753200.00
-4558.77	4	123766050	30941512.50
-4649.76	4	123766050	30941512.50
-4740.75	4	123766050	30941512.50
-4831.74	3	123766100	41255366.67
-4922.73	2	123766200	61883100.00
-5013.72	1	123766250	123766250.00
-5104.71	1	123766250	123766250.00

A lot of these results are useless, as they explore negative cutoffs that end up lumping large amounts of the genomes together into "peaks". More control over the cutoffs tested by cutoff-analysis could give me better insight into the distribution of my data.

One basic solution would be to add a "no-negative" option, which prevents cutoff-analysis from considering negative thresholds, which seems relevant when dealing with data normalised like mine. A more in-depth approach would be to let the user specify the min, max and step to use to calculate the cutoffs (which I assume is what cutoff-analysis with cutoff-analysis-steps does, but using the min and max in the file) to allow full control, but I don't know how feasible that is.

Currently I'm rerunning the analysis with 500 cutoffs, but still more than half of them will be unusable.

Jul 22 '24 12:07 callum-b

MACS MACS copied to clipboard

Better cutoff levels control for bdgcallpeak cutoff-analysis

MACS
MACS copied to clipboard