genomic-features
genomic-features copied to clipboard
TSS/ promoter regions
Description of feature
Retrieve transcription start sites for genes for use with ATAC data.
This could be similar to the GenomicFeatures::promoters function (description available in this vignette) which:
The
promotersfunction computes a GRanges object that spans the promoter region around the transcription start site for the transcripts in a TxDb object. The upstream and downstream arguments define the number of bases upstream and downstream from the transcription start site that make up the promoter region.
This could be done using bioframe.expand (though that is currently not strand aware: https://github.com/open2c/bioframe/issues/144). This could instead by done with ibis with: ifelse
- Add option to subset only to canonical transcripts (as indicated in genes table), and add gene ID
Notes on filtering behaviour: in bioconductor the filtering is done at the level of transcripts, before the promoter sequences are defined.
Example: if promoter is within filtered range, but the transcript is not, then the promoter is not returned:
> transcripts(EnsDb.Hsapiens.v86, filter=GRangesFilter(GRanges('1:9000-12000'), type='any'))
GRanges object with 1 range and 6 metadata columns:
seqnames ranges strand | tx_id tx_biotype tx_cds_seq_start tx_cds_seq_end gene_id tx_name
<Rle> <IRanges> <Rle> | <character> <character> <integer> <integer> <character> <character>
ENST00000456328 1 11869-14409 + | ENST00000456328 processed_transcript <NA> <NA> ENSG00000223972 ENST00000456328
> promoters(EnsDb.Hsapiens.v86, filter=GRangesFilter(GRanges('1:9000-12000'), type='any'))
GRanges object with 1 range and 6 metadata columns:
seqnames ranges strand | tx_id tx_biotype tx_cds_seq_start tx_cds_seq_end gene_id tx_name
<Rle> <IRanges> <Rle> | <character> <character> <integer> <integer> <character> <character>
ENST00000456328 1 9869-12068 + | ENST00000456328 processed_transcript <NA> <NA> ENSG00000223972 ENST00000456328
-------
seqinfo: 1 sequence from GRCh38 genome
With within filtering:
> transcripts(EnsDb.Hsapiens.v86, filter=GRangesFilter(GRanges('1:9000-12000'), type='within'))
GRanges object with 0 ranges and 6 metadata columns:
seqnames ranges strand | tx_id tx_biotype tx_cds_seq_start tx_cds_seq_end gene_id tx_name
<Rle> <IRanges> <Rle> | <character> <character> <integer> <integer> <character> <character>
-------
seqinfo: no sequences
> promoters(EnsDb.Hsapiens.v86, filter=GRangesFilter(GRanges('1:9000-12000'), type='within'))
GRanges object with 0 ranges and 6 metadata columns:
seqnames ranges strand | tx_id tx_biotype tx_cds_seq_start tx_cds_seq_end gene_id tx_name
<Rle> <IRanges> <Rle> | <character> <character> <integer> <integer> <character> <character>
-------
seqinfo: no sequences