ArchR icon indicating copy to clipboard operation
ArchR copied to clipboard

Error with `import10xFeatureMatrix` when reading 6 files: rownames (genes) of individual RNA objects are not equivalent

Open raquel-bc opened this issue 2 years ago • 9 comments

Hi!

I got the following error using 'import10xFeatureMatrix' with 6 RNA matrices, at the step of "Merging individual RNA objects": Error in import10xFeatureMatrix(input = c("/path/filtered_feature_bc_matrix.h5", : Error - rownames (genes) of individual RNA objects are not equivalent.

I used ArchR_1.0.2

The file was correctly read when I run 'import10xFeatureMatrix' alone.

Thanks!

ArchR-addGeneExpressionMatrix-13dd2cee9bfa-Date-2022-05-21_Time-23-07-01.log

raquel-bc avatar May 22 '22 00:05 raquel-bc

Hi @raquel-bc! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

rcorces avatar May 22 '22 00:05 rcorces

This is a sanity check imposed on your input data. The multiple H5 files that you are trying to important do not contain the same gene names. I've never encountered this from a user before but it seems like a red flag to me.

Can you describe what your input is and (if you know how) examine the H5 files to see why your genes are different across the multiple input files? If you find a compelling reason for why this might happen, I can modify the function to allow this type of input by only using the gene names that match across the multiple input files.

rcorces avatar May 23 '22 13:05 rcorces

Oh, I think I know this one. For reasons I do not understand, the row order is different (and possibly the resulting rowData) on the SCE that come in via this import process. Here's what I did to fix the issue. It is just two samples, but you can probably generalize:

seRNA <- import10xFeatureMatrix(
  input = c(RNApath1,RNApath2),
  names = c("N1_2","gp120")
)
#Error in combining individual feature matrices! Returning as a list of individual feature matrices!
#So, maybe we need to add some colData and try again?
colData(seRNA[[1]])$Sample = "N1_2"
colData(seRNA[[2]])$Sample = "gp120"
#That doesn't do it...
#Looks like the order of the features may be different?
seRNA[[1]]<-seRNA[[1]][rownames(seRNA[[2]]),] #confirm with identical(rownames(seRNA[[1]]),rownames(seRNA[[2]]))
rowRanges(seRNA[[1]])<-rowRanges(seRNA[[2]])
seRNA_combined<-cbind(seRNA[[1]],seRNA[[2]])

proj <- addGeneExpressionMatrix(input = proj, seRNA = seRNA_combined)

dagarfield avatar Jun 17 '22 00:06 dagarfield

@dagarfield - I think that particular issue I fixed a few weeks back in release_1.0.2 . Now, if multiple h5 files are provided, I re-order them based on genomic coordinates: https://github.com/GreenleafLab/ArchR/blob/706b88a80570f814b01959a5d9d536adf7ac64b4/R/MultiModal.R#L43-L46 There is something funky going on in cellranger that gives inconsistent gene/transcript definition (described here - https://github.com/GreenleafLab/ArchR/issues/507#issuecomment-1063139673)

I could add a force parameter here to convert some of the sanity check errors to warnings. in my original thought process, h5 files with non-identical gene names was an error not a warning but it would prevent people from importing data easily.

@raquel-bc - any update on your end?

rcorces avatar Jun 17 '22 03:06 rcorces

closing due to inactivity.

rcorces avatar Aug 11 '22 00:08 rcorces

actually, nevermind. I'll leave this one open. We can make it so that mis-matched genes are just excluded.

rcorces avatar Aug 11 '22 00:08 rcorces

@rcorces Hi Ryan, Thanks for your excellent single-cell analysis tool. I have the same issue when using 'import10xFeatureMatrix' with 4 multiome sample data. I am 100% sure the problem is cellranger, which gives inconsistent gene/transcript definitions. Because if I input the RNA matrix individually, all samples could be input RNA matrix successfully. I am also testing three of my samples that could be input RNA matrix together without error. There is only one sample not compatible with other samples. I would appreciate it if you could add a force parameter to convert some of the sanity check errors to warnings or tell us how to do that. Thanks!

Achernar121 avatar Aug 16 '22 16:08 Achernar121

@Achernar121 - I've made quite a few changes to import10xFeatureMatrix() in the dev_geneMismatch branch. I havent created a test dataset with gene mismatches for this so this is more or less a blind fix. If I have time, I'll create a test dataset but would be great if you could test and report back with details.

To install:

devtools::install_github("GreenleafLab/ArchR", ref="dev_geneMismatch", repos = BiocManager::repositories())
#to unload a package and reload
detach("package:ArchR", unload=TRUE)
library(ArchR)

rcorces avatar Aug 18 '22 13:08 rcorces

@Achernar121 - I've made quite a few changes to import10xFeatureMatrix() in the dev_geneMismatch branch. I havent created a test dataset with gene mismatches for this so this is more or less a blind fix. If I have time, I'll create a test dataset but would be great if you could test and report back with details.

To install:

devtools::install_github("GreenleafLab/ArchR", ref="dev_geneMismatch", repos = BiocManager::repositories())
#to unload a package and reload
detach("package:ArchR", unload=TRUE)
library(ArchR)

Thank you so much! It works great for me. Please tell me what kinds of detail should I report; I will try my best to gather information!

Achernar121 avatar Aug 21 '22 05:08 Achernar121

I was hoping to see what the console output looks like on your end. And maybe you could just spot check a few genes (preferably with names starting with letters that are later in the alphabet) by looking at UMAP embedding of the ATAC and RNA signal to make sure that there isnt some sort of shuffling that occurred when removing the genes.

For example, if the console output says that it is removing the gene "CD4" then I might worry that genes later in the alphabet (i.e. "IRF8") might get shuffled accidentally during the removal procedure.

rcorces avatar Aug 22 '22 13:08 rcorces

My previous request wasnt quite right. It isnt about alphabetical ordering, its about chromosome based ordering. So if a gene is removed on chr4 check out genes on later chromosomes.

rcorces avatar Aug 23 '22 12:08 rcorces

I was hoping to see what the console output looks like on your end. And maybe you could just spot check a few genes (preferably with names starting with letters that are later in the alphabet) by looking at UMAP embedding of the ATAC and RNA signal to make sure that there isnt some sort of shuffling that occurred when removing the genes.

For example, if the console output says that it is removing the gene "CD4" then I might worry that genes later in the alphabet (i.e. "IRF8") might get shuffled accidentally during the removal procedure.

I got error when using addGeneExpressionMatrix( ) function Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘seqnames’ for signature ‘"SummarizedExperiment"’

Then I run seqnames(seRNA), got the same error: seqnames(seRNA) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘seqnames’ for signature ‘"SummarizedExperiment"’

Then I tested three of my samples that could be input RNA matrix together without error before. After import10xFeatureMatrix()as seRNA2, The seqnames(seRNA2) have same error. Then I switch back to the public version of ArchR 1.0.2. Using import10xFeatureMatrix() again with three sample as seRNA3. This time there is no error:

> seqnames(seRNA3)
factor-Rle of length 36578 with 39 runs
  Lengths:       3409       2540       1892 ...          6          1          3
  Values : chr1       chr2       chr3       ... KI270728.1 KI270731.1 KI270734.1
Levels(39): chr1 chr2 chr3 chr4 ... KI270727.1 KI270728.1 KI270731.1 KI270734.1

It seems that the new import10xFeatureMatrix() function will output a defective SummarizedExperiment class which couldn't get seqnames The seqnames slot provides the chromosome name that can then be used to get the relevant pileup from bam files. Maybe the new import10xFeatureMatrix() function will affect packages 'GenomicRanges' ?

Achernar121 avatar Aug 23 '22 18:08 Achernar121

thanks. This appears to be a separate bug introduced recently. Will report back soon.

rcorces avatar Aug 23 '22 20:08 rcorces

I believe that all of this has now been fixed on dev_geneMismatch if you could test and confirm on your end, I would appreciate it.

rcorces avatar Aug 24 '22 16:08 rcorces

I tried both the master and dev_geneMismatch versions, the same error in addPeak2GeneLinks with GeneExpressionMatrix persists: "Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘seqnames’ for signature ‘"NULL".

I don't know if it is due to some genenames in my gene list/GeneExpressionMatrix are not in the gene annotation of ArchR, so ArchR cannot find the genomic locations of these genes.

devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories()) devtools::install_github("GreenleafLab/ArchR", ref="dev_geneMismatch", repos = BiocManager::repositories())

proj <- addPeak2GeneLinks(

  • ArchRProj = proj,
  • reducedDims = "LSI_Combined",
  • useMatrix = "GeneExpressionMatrix",
  • dimsToUse = 1:30,
  • scaleDims = NULL,
  • corCutOff = 0.75,
  • k = 100,
  • knnIteration = 500,
  • overlapCutoff = 0.8,
  • maxDist = 250000,
  • scaleTo = 10^4,
  • log2Norm = TRUE,
  • predictionCutoff = 0.4,
  • seed = 1,
  • threads = max(floor(getArchRThreads()/2), 1),
  • verbose = TRUE,
  • logFile = createLogFile("addPeak2GeneLinks")
  • ) ArchR logging to : ArchRLogs/ArchR-addPeak2GeneLinks-9fb26d3ada1b-Date-2022-08-24_Time-11-59-59.log If there is an issue, please report to github with logFile! 2022-08-24 12:00:01 : Getting Available Matrices, 0.031 mins elapsed. No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! No predictionScore found. Continuing without predictionScore! 2022-08-24 12:00:10 : Filtered Low Prediction Score Cells (0 of 86926, 0), 0.017 mins elapsed. 2022-08-24 12:00:12 : Computing KNN, 0.051 mins elapsed. 2022-08-24 12:00:13 : Identifying Non-Overlapping KNN pairs, 0.058 mins elapsed. 2022-08-24 12:00:17 : Identified 498 Groupings!, 0.125 mins elapsed. Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘seqnames’ for signature ‘"NULL"’

fe4960 avatar Aug 24 '22 17:08 fe4960

@fe4960 - its unclear exactly what you've done as your post lacks sufficient details. After installing the dev_geneMismatch branch as shown below, you will need to re-run import10xFeatureMatrix() and addGeneExpressionMatrix() prior to re-running addPeak2GeneLinks()

devtools::install_github("GreenleafLab/ArchR", ref="dev_geneMismatch", repos = BiocManager::repositories())
#to unload a package and reload
detach("package:ArchR", unload=TRUE)
library(ArchR)

rcorces avatar Aug 25 '22 03:08 rcorces

Previously I did not re-run import10xFeatureMatrix() and addGeneExpressionMatrix(), they did not work. This time I installed the dev_geneMismatch branch as below and re-run import10xFeatureMatrix(), addGeneExpressionMatrix(), and addPeak2GeneLinks(). They all work. Thanks so much!

devtools::install_github("GreenleafLab/ArchR", ref="dev_geneMismatch", repos = BiocManager::repositories()) #to unload a package and reload detach("package:ArchR", unload=TRUE) library(ArchR)

fe4960 avatar Aug 25 '22 23:08 fe4960

This has been merged into dev and will become part of release_1.0.3

rcorces avatar Aug 27 '22 03:08 rcorces

Hey @rcorces any idea when 1.0.3 would be released?

abs51295 avatar Oct 11 '22 23:10 abs51295

Approximately 1 month from now but I've been saying that for 2 months already. You can always work off of dev. Timelines are hard for me to estimate.

rcorces avatar Oct 11 '22 23:10 rcorces