odd result from Lee et al. 2019 data
Hi, I was using ALLCools on the Lee et al. 2019 single-cell methylation data (https://pubmed.ncbi.nlm.nih.gov/31501549/). The code I use to generate the mcds data is something like below:
allcools generate-dataset --allc_table /p/keles/schic/volumeD/HumanPatchSeq/allcfiles.tsv --output_path /p/keles/schic/volumeD/HumanPatchSeq/Lee2019.mcds --chrom_size_path hg19.chrom.sizes--obs_dim cell --cpu 30 --chunk_size 50 --regions geneslop2k /p/keles/schic/volumeD/HumanPatchSeq/hg19.bed --quantifiers geneslop2k count CGN,CHN
I load the output data like what is in the example. However, when I look at the data using mcds.get_adata(mc_type = "CHN").to_df(), the data shown is like below:

Some cells are consistently 10 among all genes. Here is the result if you want to check it. https://drive.google.com/drive/folders/1PQ5jhvTFuRsm9qn8oVDRS9MzCkiMQnx3?usp=share_link
Is this the problem of the data of am I doing something wrong?
Edit: I guess it is the easiest if you can tell me how you do integration between single-cell methylation and other transcriptomics data in a bit more detail? Especially how to form the cell by gene matrix for methylation? Previously, I directly calculate mCH fraction by first filtering out those CG reads, and among all the rest CH reads, for each cell, I calculate the mean of the fifth column in the gene body divided by the mean methylation of the whole cell. However, this approach failed. Below is the r file if you are interested.
library(foreach)
data(Annotations)
# generate a gene annotation data for hg19.
colnames(hg19Annotations) = c("chr", "start", "end", "strand", "names")
hg19Annotations = hg19Annotations[order(hg19Annotations$names), ]
genes = makeGRangesFromDataFrame(hg19Annotations, keep.extra.columns = TRUE)
# list all the files, all of them are tsv.gz methylation files.
filenames <- list.files('/storage08/kwangmoon/data/Lee2019/methylation/rawdata', full.names = TRUE)
names <- list.files('/storage08/kwangmoon/data/Lee2019/methylation/rawdata', full.names = FALSE)
registerDoParallel(8)
foreach(cell = 1:length(filenames)) %dopar% {
tmp = fread(filenames[cell], select = c(1, 2, 4, 5, 6), fill = TRUE)
if(!grepl('chr', tmp$V1[1])){tmp$V1 = paste0("chr", tmp$V1)} #add "chr" in the front of chromosomes
tmpmCG <- tmp %>% filter(!grepl("CG", V4)) # eliminate all CG and keep only CH.
tmpmCG <- tmpmCG %>% filter(V1 %in% paste0("chr", c(1:22, "X", "Y"))) # only use chr1-22, X, Y.
methy = GRanges(tmpmCG$V1, IRanges(tmpmCG$V2, width = 1))
overlapPos = findOverlaps(methy, genes) # get all the reads that overlap with genes.
output = data.frame(tmpmCG[queryHits(overlapPos), ],
genes = hg19Annotations[subjectHits(overlapPos), ]$names)
mCHmean = mean(output$V5) # this is the mean methylation for all cells
write.table(output %>% group_by(genes) %>% summarize(Methyl_rate = mean(V5) / mCHmean), # we calculate mean methylation in the gene body divide by the mean methylation of all cells
file = paste0("/p/keles/schic/volumeD/HumanPatchSeq/MethylGene/", names[cell]), row.names = FALSE, col.names = FALSE, quote = FALSE, sep = "\t")
}