ArchR
ArchR copied to clipboard
Error with `import10xFeatureMatrix` when reading 6 files: rownames (genes) of individual RNA objects are not equivalent
Hi!
I got the following error using 'import10xFeatureMatrix' with 6 RNA matrices, at the step of "Merging individual RNA objects":
Error in import10xFeatureMatrix(input = c("/path/filtered_feature_bc_matrix.h5", : Error - rownames (genes) of individual RNA objects are not equivalent.
I used ArchR_1.0.2
The file was correctly read when I run 'import10xFeatureMatrix' alone.
Thanks!
ArchR-addGeneExpressionMatrix-13dd2cee9bfa-Date-2022-05-21_Time-23-07-01.log
Hi @raquel-bc! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information:
1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved?
2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example.
3. Did you post your log file? If not, add it now.
4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.
This is a sanity check imposed on your input data. The multiple H5 files that you are trying to important do not contain the same gene names. I've never encountered this from a user before but it seems like a red flag to me.
Can you describe what your input is and (if you know how) examine the H5 files to see why your genes are different across the multiple input files? If you find a compelling reason for why this might happen, I can modify the function to allow this type of input by only using the gene names that match across the multiple input files.
Oh, I think I know this one. For reasons I do not understand, the row order is different (and possibly the resulting rowData) on the SCE that come in via this import process. Here's what I did to fix the issue. It is just two samples, but you can probably generalize:
seRNA <- import10xFeatureMatrix(
input = c(RNApath1,RNApath2),
names = c("N1_2","gp120")
)
#Error in combining individual feature matrices! Returning as a list of individual feature matrices!
#So, maybe we need to add some colData and try again?
colData(seRNA[[1]])$Sample = "N1_2"
colData(seRNA[[2]])$Sample = "gp120"
#That doesn't do it...
#Looks like the order of the features may be different?
seRNA[[1]]<-seRNA[[1]][rownames(seRNA[[2]]),] #confirm with identical(rownames(seRNA[[1]]),rownames(seRNA[[2]]))
rowRanges(seRNA[[1]])<-rowRanges(seRNA[[2]])
seRNA_combined<-cbind(seRNA[[1]],seRNA[[2]])
proj <- addGeneExpressionMatrix(input = proj, seRNA = seRNA_combined)
@dagarfield - I think that particular issue I fixed a few weeks back in release_1.0.2
. Now, if multiple h5 files are provided, I re-order them based on genomic coordinates:
https://github.com/GreenleafLab/ArchR/blob/706b88a80570f814b01959a5d9d536adf7ac64b4/R/MultiModal.R#L43-L46
There is something funky going on in cellranger that gives inconsistent gene/transcript definition (described here - https://github.com/GreenleafLab/ArchR/issues/507#issuecomment-1063139673)
I could add a force
parameter here to convert some of the sanity check errors to warnings. in my original thought process, h5 files with non-identical gene names was an error not a warning but it would prevent people from importing data easily.
@raquel-bc - any update on your end?
closing due to inactivity.
actually, nevermind. I'll leave this one open. We can make it so that mis-matched genes are just excluded.
@rcorces Hi Ryan, Thanks for your excellent single-cell analysis tool. I have the same issue when using 'import10xFeatureMatrix' with 4 multiome sample data. I am 100% sure the problem is cellranger, which gives inconsistent gene/transcript definitions. Because if I input the RNA matrix individually, all samples could be input RNA matrix successfully. I am also testing three of my samples that could be input RNA matrix together without error. There is only one sample not compatible with other samples. I would appreciate it if you could add a force parameter to convert some of the sanity check errors to warnings or tell us how to do that. Thanks!
@Achernar121 - I've made quite a few changes to import10xFeatureMatrix()
in the dev_geneMismatch
branch. I havent created a test dataset with gene mismatches for this so this is more or less a blind fix. If I have time, I'll create a test dataset but would be great if you could test and report back with details.
To install:
devtools::install_github("GreenleafLab/ArchR", ref="dev_geneMismatch", repos = BiocManager::repositories())
#to unload a package and reload
detach("package:ArchR", unload=TRUE)
library(ArchR)
@Achernar121 - I've made quite a few changes to
import10xFeatureMatrix()
in thedev_geneMismatch
branch. I havent created a test dataset with gene mismatches for this so this is more or less a blind fix. If I have time, I'll create a test dataset but would be great if you could test and report back with details.To install:
devtools::install_github("GreenleafLab/ArchR", ref="dev_geneMismatch", repos = BiocManager::repositories()) #to unload a package and reload detach("package:ArchR", unload=TRUE) library(ArchR)
Thank you so much! It works great for me. Please tell me what kinds of detail should I report; I will try my best to gather information!
I was hoping to see what the console output looks like on your end. And maybe you could just spot check a few genes (preferably with names starting with letters that are later in the alphabet) by looking at UMAP embedding of the ATAC and RNA signal to make sure that there isnt some sort of shuffling that occurred when removing the genes.
For example, if the console output says that it is removing the gene "CD4" then I might worry that genes later in the alphabet (i.e. "IRF8") might get shuffled accidentally during the removal procedure.
My previous request wasnt quite right. It isnt about alphabetical ordering, its about chromosome based ordering. So if a gene is removed on chr4
check out genes on later chromosomes.
I was hoping to see what the console output looks like on your end. And maybe you could just spot check a few genes (preferably with names starting with letters that are later in the alphabet) by looking at UMAP embedding of the ATAC and RNA signal to make sure that there isnt some sort of shuffling that occurred when removing the genes.
For example, if the console output says that it is removing the gene "CD4" then I might worry that genes later in the alphabet (i.e. "IRF8") might get shuffled accidentally during the removal procedure.
I got error when using addGeneExpressionMatrix( )
function
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘seqnames’ for signature ‘"SummarizedExperiment"’
Then I run seqnames(seRNA)
, got the same error:
seqnames(seRNA) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘seqnames’ for signature ‘"SummarizedExperiment"’
Then I tested three of my samples that could be input RNA matrix together without error before. After import10xFeatureMatrix()
as seRNA2
, The seqnames(seRNA2)
have same error. Then I switch back to the public version of ArchR 1.0.2. Using import10xFeatureMatrix()
again with three sample as seRNA3
. This time there is no error:
> seqnames(seRNA3)
factor-Rle of length 36578 with 39 runs
Lengths: 3409 2540 1892 ... 6 1 3
Values : chr1 chr2 chr3 ... KI270728.1 KI270731.1 KI270734.1
Levels(39): chr1 chr2 chr3 chr4 ... KI270727.1 KI270728.1 KI270731.1 KI270734.1
It seems that the new import10xFeatureMatrix()
function will output a defective SummarizedExperiment
class which couldn't get seqnames
The seqnames slot provides the chromosome name that can then be used to get the relevant pileup from bam files. Maybe the new import10xFeatureMatrix()
function will affect packages 'GenomicRanges' ?
thanks. This appears to be a separate bug introduced recently. Will report back soon.
I believe that all of this has now been fixed on dev_geneMismatch
if you could test and confirm on your end, I would appreciate it.
I tried both the master and dev_geneMismatch versions, the same error in addPeak2GeneLinks with GeneExpressionMatrix persists: "Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘seqnames’ for signature ‘"NULL".
I don't know if it is due to some genenames in my gene list/GeneExpressionMatrix are not in the gene annotation of ArchR, so ArchR cannot find the genomic locations of these genes.
devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories()) devtools::install_github("GreenleafLab/ArchR", ref="dev_geneMismatch", repos = BiocManager::repositories())
proj <- addPeak2GeneLinks(
- ArchRProj = proj,
- reducedDims = "LSI_Combined",
- useMatrix = "GeneExpressionMatrix",
- dimsToUse = 1:30,
- scaleDims = NULL,
- corCutOff = 0.75,
- k = 100,
- knnIteration = 500,
- overlapCutoff = 0.8,
- maxDist = 250000,
- scaleTo = 10^4,
- log2Norm = TRUE,
- predictionCutoff = 0.4,
- seed = 1,
- threads = max(floor(getArchRThreads()/2), 1),
- verbose = TRUE,
- logFile = createLogFile("addPeak2GeneLinks")
- ) ArchR logging to : ArchRLogs/ArchR-addPeak2GeneLinks-9fb26d3ada1b-Date-2022-08-24_Time-11-59-59.log If there is an issue, please report to github with logFile! 2022-08-24 12:00:01 : Getting Available Matrices, 0.031 mins elapsed. No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! 2022-08-24 12:00:10 : Filtered Low Prediction Score Cells (0 of 86926, 0), 0.017 mins elapsed. 2022-08-24 12:00:12 : Computing KNN, 0.051 mins elapsed. 2022-08-24 12:00:13 : Identifying Non-Overlapping KNN pairs, 0.058 mins elapsed. 2022-08-24 12:00:17 : Identified 498 Groupings!, 0.125 mins elapsed. Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘seqnames’ for signature ‘"NULL"’
@fe4960 - its unclear exactly what you've done as your post lacks sufficient details. After installing the dev_geneMismatch
branch as shown below, you will need to re-run import10xFeatureMatrix()
and addGeneExpressionMatrix()
prior to re-running addPeak2GeneLinks()
devtools::install_github("GreenleafLab/ArchR", ref="dev_geneMismatch", repos = BiocManager::repositories())
#to unload a package and reload
detach("package:ArchR", unload=TRUE)
library(ArchR)
Previously I did not re-run import10xFeatureMatrix() and addGeneExpressionMatrix(), they did not work. This time I installed the dev_geneMismatch branch as below and re-run import10xFeatureMatrix(), addGeneExpressionMatrix(), and addPeak2GeneLinks(). They all work. Thanks so much!
devtools::install_github("GreenleafLab/ArchR", ref="dev_geneMismatch", repos = BiocManager::repositories()) #to unload a package and reload detach("package:ArchR", unload=TRUE) library(ArchR)
This has been merged into dev
and will become part of release_1.0.3
Hey @rcorces any idea when 1.0.3
would be released?
Approximately 1 month from now but I've been saying that for 2 months already. You can always work off of dev
. Timelines are hard for me to estimate.