HiC-Pro icon indicating copy to clipboard operation
HiC-Pro copied to clipboard

How to generate the gene.gr input for HiTC pca.hic function?

Open jennymoon90 opened this issue 5 years ago • 3 comments

Hi, I am trying to use the pca.hic function to call A/B compartments. How may I generate the object of class GenomicRanges describing the genes position in hg19 (gene.gr in the pca.hic function)? Thank you, Jenny

jennymoon90 avatar Sep 27 '19 01:09 jennymoon90

Hi Jenny, You can simply load a gtf file of gene expression and then build a GenomicRanges object as ;

require(GenomicFeatures)
gtf.in <- "Mus_musculus.NCBIM37.cleaned.gtf"
txdb <- makeTxDbFromGFF(gtf.in,format="gtf", dataSource="GTF")
geneannot <- genes(txdb)
seqlevels(geneannot)<-paste0("chr",seqlevels(geneannot))
geneannot <- sort(geneannot)

Best

nservant avatar Oct 03 '19 11:10 nservant

Thank you, Nicolas. I have a few follow-up questions:

  1. Which gif should I use for human (hg19)?

  2. I used the following codes, and I don't know whether it makes a difference: txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene transcript <- transcriptsBy(txdb, "gene") gene <- reduce(transcript) genes=unlist(gene) chr="chr1" genes_chr=genes[seqnames(genes)==chr] pr <- pca.hic(x,normPerExpected = T, npc=1,asGRangesList = T, gene.gr = genes_chr)

  3. It seems from the HiTC tutorial that pca.hic does the getPearsonMap function as the first step. So does it center the observed/expected map before calculating the Pearson correlation by default?

  4. If I have 6 samples from two genotypes (n=3 for each genotype group), and I have one Hi-C map for each sample, so 6 maps total. Do you think it makes sense to get compartment scores from each map, combine them into one data frame, and then call differential compartment scores for each bin using t-test, sort of like what people do with transcriptomic profiles? In other words, does it make sense to compare the compartment score values across different samples?

Thank you, Jenny

jennymoon90 avatar Jan 06 '20 20:01 jennymoon90

@jennymoon90 the GTF you could use for hg19 would probably be from Gencode (https://www.gencodegenes.org/human/release_19.html). You can then import this into R with:

library(rtracklayer)
library(GenomicRanges)
gencode_hg19_gtf <- import("path/to/my/gtf")

#this will come in as a GRanges object with gene names, Ensembl IDs, etc
#keep just the genes
#GRanges object of hg19 genes
gencode_hg19_gtf.genes <- gencode_hg19_gtf[gencode_hg19_gtf$type == "gene",]

#swap back and forth between UCSC and Ensembl style chromosome naming
#note that Gencode is UCSC style - e.g. chr1
#go to Ensembl - e.g. 1
seqlevelsStyle(gencode_hg19_gtf.genes) <- "Ensembl"
#back to UCSC
seqlevelsStyle(gencode_hg19_gtf.genes) <- "UCSC"

I would be really careful about doing compartment differences between genotypes before first making sure the compartments within your genotype/group are largely consistent. This is because the sign of the eigenvectors is assigned arbitrarily (e.g. 'open' could be a negative eigenvalue for a chromosome or sample). This is a feature and not a bug of SVD/PCA :) I'm unsure how HiTC handles the check that 'open' compartments are positive eigenvalues. The t-test approach gets weird on a bin level since there is an inherent correlation structure to how compartments/3D genomic structures interact with each other. Good luck!

biobenkj avatar Mar 30 '20 12:03 biobenkj