gatk
gatk copied to clipboard
Create tool for producing genomic regions (as a BED file)
Feature request
Tool(s) or class(es) involved
This is a request for a new tool
GencodeRegionsAsBED
Description
Given a GENCODE gtf, create a BED file with the region of the genes. Each row is a gene.
Suggestion: This can be implemented as a FeatureWalker<GencodeGtfFeature>
Requirements
- [P0] Union all basic, coding transcripts to determine region. "basic" is a tag, defined by GENCODE, that appears on transcripts in the gtf.
- [P0] Include option to separate each row by the transcript, as well. I.e. Each row is a transcript. Please include gene and transcript id in the output BED. Transcript entries should be sorted in natural order (in this case, natural order and alphabetical order will be the same).
- [P0] Must support GENCODE v35 and above (through the latest at the time of the implementation)
- [P0] Supports hg38 (note that this is implicit in the GENCODE version)
- [P2] Include option that will create the BED file based on both basic and non-basic transcripts
- [P2] Include option that will create the BED file based on both coding and non-coding transcripts
- [P2] Include option to break out exon vs intron vs UTR, etc.
- [P2] Support hg19/b37, which means supporting earlier versions of GENCODE.
[P0] = "Must have. Cannot close this issue without this feature or without filing another issue. This tool is not considered complete without this feature." [P2] = "Not required. This tool can be considered complete without this feature. No need to ask permission to drop it. If it is NOT delivered, please mention what P2's were not delivered in the closing comment of this issue."
Example output
BED is tab-delimited...
...
chr22 21759657 21867680 MAPK1
...
With transcript option:
...
chr22 21759657 21867645 MAPK1,ENST00000215832.11
chr22 21769040 21867680 MAPK1,ENST00000398822.7
chr22 21769204 21867440 MAPK1,ENST00000544786.1
...
Note: The union of the transcript regions is reported when the transcript option is not present.