gencode_regions
gencode_regions copied to clipboard
Extract 3'UTR, 5'UTR, CDS, Promoter, Genes, Introns, Exons from GTF files
gencode_regions
Extract 3'UTR, 5'UTR, CDS, Promoter, Genes from GTF files.
Data
If you only care about the final output, they are hosted build and GTF version wise on riboraptor.
Using Python
Dependencies
We recommend setting up a conda environment with Python>=3
and Python<=3.7
with gffutils v0.9
and pybedtools:
conda create --name gencode_env python=3.7
conda activate gencode_env
conda install -c bioconda gffutils=0.9 pybedtools
Notebooks
- BDGP6
- GRCg6
- GRCz10
- GRCz11
- GRCh38
- MG1655
- GRCm10
- Mmul8
- panTro3
- Rnor6.0
- sacCerR64
- WBcel235
- Felis_catus9.0
The corresponding output gzipped beds are in the data directory.
Using R
Dependencies
- r>=3.2.1
- GenomicFeatures
Run
./create_regions_from_gencode.R <path_to_GFF/GTF> <path_to_output_dir>
Will create exons.bed, 3UTR.bed, 5UTR.bed, genes.bed, cds.bed
in <output_dir>
Example
- Download GFF/GTF(GRCh37, v25, comprehensive, CHR) from gencodegenes.org:
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.annotation.gff3.gz \
&& gunzip gencode.v25.annotation.gff3.gz
- Create regions:
./create_regions_from_gencode.R gencode.v25.annotation.gff3 /path/to/GRCh37/annotation
First exons, Last exons
We use GenePred
format to make the process a bit simple.
-
Download gtfToGenePred
-
Convert gtf to GenePred:
gtfToGenePred gencode.v25.annotation.gtf gencode.v25.annotation.genepred
-
Extract
first exons
:python genepred_to_bed.py --first_exon gencode.v25.annotation.genepred
-
Extract
last exons
:python genepred_to_bed.py --last_exon gencode.v25.annotation.genepred
Confused about exons and UTRs?
This should be helpful:
Source: Wikipedia
or probably this: