HiC-Pro
HiC-Pro copied to clipboard
How to generate the gene.gr input for HiTC pca.hic function?
Hi, I am trying to use the pca.hic function to call A/B compartments. How may I generate the object of class GenomicRanges describing the genes position in hg19 (gene.gr in the pca.hic function)? Thank you, Jenny
Hi Jenny, You can simply load a gtf file of gene expression and then build a GenomicRanges object as ;
require(GenomicFeatures)
gtf.in <- "Mus_musculus.NCBIM37.cleaned.gtf"
txdb <- makeTxDbFromGFF(gtf.in,format="gtf", dataSource="GTF")
geneannot <- genes(txdb)
seqlevels(geneannot)<-paste0("chr",seqlevels(geneannot))
geneannot <- sort(geneannot)
Best
Thank you, Nicolas. I have a few follow-up questions:
-
Which gif should I use for human (hg19)?
-
I used the following codes, and I don't know whether it makes a difference:
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene transcript <- transcriptsBy(txdb, "gene") gene <- reduce(transcript) genes=unlist(gene) chr="chr1" genes_chr=genes[seqnames(genes)==chr] pr <- pca.hic(x,normPerExpected = T, npc=1,asGRangesList = T, gene.gr = genes_chr)
-
It seems from the HiTC tutorial that pca.hic does the getPearsonMap function as the first step. So does it center the observed/expected map before calculating the Pearson correlation by default?
-
If I have 6 samples from two genotypes (n=3 for each genotype group), and I have one Hi-C map for each sample, so 6 maps total. Do you think it makes sense to get compartment scores from each map, combine them into one data frame, and then call differential compartment scores for each bin using t-test, sort of like what people do with transcriptomic profiles? In other words, does it make sense to compare the compartment score values across different samples?
Thank you, Jenny
@jennymoon90 the GTF you could use for hg19 would probably be from Gencode (https://www.gencodegenes.org/human/release_19.html). You can then import this into R with:
library(rtracklayer)
library(GenomicRanges)
gencode_hg19_gtf <- import("path/to/my/gtf")
#this will come in as a GRanges object with gene names, Ensembl IDs, etc
#keep just the genes
#GRanges object of hg19 genes
gencode_hg19_gtf.genes <- gencode_hg19_gtf[gencode_hg19_gtf$type == "gene",]
#swap back and forth between UCSC and Ensembl style chromosome naming
#note that Gencode is UCSC style - e.g. chr1
#go to Ensembl - e.g. 1
seqlevelsStyle(gencode_hg19_gtf.genes) <- "Ensembl"
#back to UCSC
seqlevelsStyle(gencode_hg19_gtf.genes) <- "UCSC"
I would be really careful about doing compartment differences between genotypes before first making sure the compartments within your genotype/group are largely consistent. This is because the sign of the eigenvectors is assigned arbitrarily (e.g. 'open' could be a negative eigenvalue for a chromosome or sample). This is a feature and not a bug of SVD/PCA :) I'm unsure how HiTC handles the check that 'open' compartments are positive eigenvalues. The t-test approach gets weird on a bin level since there is an inherent correlation structure to how compartments/3D genomic structures interact with each other. Good luck!