Genrich icon indicating copy to clipboard operation
Genrich copied to clipboard

Feature Request: new option -i and -I to complement -e and -E

Open malcook opened this issue 4 years ago • 6 comments

During development of a pipeline involving Genrich for integrating ATAC seq with ChIP-Seq for multiple marks, I wish to only call peaks on a few small regions. For this reason, it is desirable to be able to specify which chromosomes or bed-regions to include.

The effective genome should then be the regions to include minus the regions to exclude.

This would allow me to tell Genrich to analyze, eg, chr8 only, minus any pre-computed global region black-list.

Finally, being able to specify chromosome to include or exclude using regular expression would be great. One useful expression would be `-i ^chr\d+$' to effectively remove (in the case of exnsembl zebrafish) chrM and an of the "unknown" chromosomal fragments matching "chrUn_*".

This feature would also simplify life for people seeking an easier way to #29.

malcook avatar Nov 25 '19 23:11 malcook

Thanks for the suggestion. The reason why Genrich analyzes the whole genome by default, is because that is how these assays work. ATAC-seq, ChIP-seq, etc. are performed on whole genomes, not just certain chromosomes or regions.

Nevertheless, I will consider the request. In the meantime, please use -e and -E, and let me know if there are any issues with them.

jsh58 avatar Nov 29 '19 21:11 jsh58

Thanks for the consideration. It is really a convenience that allows me to trial run an analysis on a fraction of the genome in the interest of debugging a larger workflow on a limited set of data. I am able to use -e effectively for this purpose to exclude all but one chromosome.

Thanks for Genrich!

~ [email protected]

malcook avatar Nov 29 '19 21:11 malcook

As a workaround, you can select the regions you want using bedtools intersect.

ScottNortonPhD avatar Jun 05 '20 20:06 ScottNortonPhD

bedtools intersect is unlikely to produce the correct result in this context.

jsh58 avatar Jun 07 '20 00:06 jsh58

A parameter to provide genome length directly would also be very helpful. We subset data frequently to run multiple different peak callers with various parameters to find the best settings for a given assay.

j-andrews7 avatar Apr 21 '22 16:04 j-andrews7

There is now a -L <int> CL argument that can be used to set the genome length directly.

jsh58 avatar Mar 05 '23 17:03 jsh58