seurat-disk icon indicating copy to clipboard operation
seurat-disk copied to clipboard

Error: Missing required datasets 'levels' and 'values'

Open Diennguyen8290 opened this issue 3 years ago • 42 comments

Hi,

Thanks for developing this great tool.

I'm running into an error in LoadH5Seurat() step, stated that: Error: Missing required datasets 'levels' and 'values'.

My data was downloaded from here: https://drive.google.com/file/d/1IwWcn4W-YKgNbz4DpNweM2cKxlx1hbM0/view

My scripts:

Convert("COVID19_ALL.h5ad", dest = "h5seurat",overwrite = T, verbose = TRUE) data <- LoadH5Seurat("COVID19_ALL.h5seurat").

Please could you have me have a look.

Many thanks.

Regards, Dien

Diennguyen8290 avatar Apr 05 '22 23:04 Diennguyen8290

Same issue here even with a pretty threadbare adata object.

pbmc <- LoadH5Seurat("/NatMmergeHarmonyRT3.h5seurat")
Validating h5Seurat file
Initializing RNA with data
Adding counts for RNA
Adding feature-level metadata for RNA
Adding command information
Adding cell-level metadata
Error: Missing required datasets 'levels' and 'values'

JBreunig avatar Apr 05 '22 23:04 JBreunig

Thinking this may be due to recent updates to anndata as my postdoc is able to convert his adata objects back and forth without issue but couldn't convert my h5ad file.

JBreunig avatar Apr 07 '22 19:04 JBreunig

same issue, please help!What can we do, but wait?

clc37 avatar Apr 10 '22 09:04 clc37

same issue.............

edtim8 avatar Apr 21 '22 15:04 edtim8

I'm having the same issue converting an anndata h5ad that came from version 0.7.8. @JBreunig what version of anndata is your postdoc using?

kmh005 avatar Apr 28 '22 18:04 kmh005

Same issue here (anndata 0.8.0, scanpy 1.9.1). If I download the h5ad file used in vignettes/convert-anndata.Rmd it works. If I read it with scanpy and write it back to h5ad I get the same error.

To reproduce:

This works fine: R

library(Seurat)
library(SeuratDisk)

url <- "https://seurat.nygenome.org/pbmc3k_final.h5ad"
curl::curl_download(url, basename(url))

Convert("pbmc3k_final.h5ad", dest = "h5seurat", overwrite = TRUE)
pbmc3k <- LoadH5Seurat("pbmc3k_final.h5seurat")

If I read the same file with scanpy Python

in[1]: import scanpy as sc

in[2]: cells = sc.read_h5ad('pbmc3k_final.h5ad')

C:\Users\me\anaconda3\envs\BearOmics\lib\site-packages\anndata\compat\__init__.py:232: FutureWarning: Moving element from .uns['neighbors']['distances'] to .obsp['distances'].
This is where adjacency matrices should go now.
  warn(
C:\Users\me\anaconda3\envs\BearOmics\lib\site-packages\anndata\compat\__init__.py:232: FutureWarning: Moving element from .uns['neighbors']['connectivities'] to .obsp['connectivities'].
This is where adjacency matrices should go now.
  warn(

in[3]: cells.write('from_p.h5ad')

R

> Convert("from_p.h5ad", dest = "h5seurat", overwrite = TRUE)

Warning: Unknown file type: h5ad
Warning: 'assay' not set, setting to 'RNA'
Creating h5Seurat file for version 3.1.5.9900
Adding X as scale.data
Adding raw/X as data
Adding raw/X as counts
Adding meta.features from raw/var
Adding dispersions from scaled feature-level metadata
Adding dispersions_norm from scaled feature-level metadata
Merging gene_ids from scaled feature-level metadata
Adding highly_variable from scaled feature-level metadata
Adding means from scaled feature-level metadata
Merging n_cells from scaled feature-level metadata
Adding X_pca as cell embeddings for pca
Adding X_umap as cell embeddings for umap
Adding PCs as feature loadings fpr pca
Adding miscellaneous information for pca
Adding standard deviations for pca
Adding miscellaneous information for umap
Adding leiden to miscellaneous data

> pbmc3k <- LoadH5Seurat("from_p.h5seurat")

Validating h5Seurat file
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing RNA with data
Adding counts for RNA
Adding scale.data for RNA
Adding feature-level metadata for RNA
Adding reduction pca
Adding cell embeddings for pca
Adding feature loadings for pca
Adding miscellaneous information for pca
Adding reduction umap
Adding cell embeddings for umap
Adding miscellaneous information for umap
Adding command information
Adding cell-level metadata

Error: Missing required datasets 'levels' and 'values'
Show stack trace

rockdeme avatar Apr 28 '22 20:04 rockdeme

I'm having the same issue converting an anndata h5ad that came from version 0.7.8. @JBreunig what version of anndata is your postdoc using?

0.7.6 and 1.8.1

JBreunig avatar Apr 29 '22 00:04 JBreunig

I also get the Error: Missing required datasets 'levels' and 'values' when adding cell-level metadata whilst using LoadH5Seurat(). I am using the latest version of SeuratDisk_0.0.0.9020 and anndata 0.8.0.

Did anyone manage to solve this issue?

tomthun avatar Apr 29 '22 13:04 tomthun

I got the same issue. I saw everyone mentioning anndata version and so I tried 0.7.5 and seems to be no issue for now.

michaeleekk avatar May 01 '22 07:05 michaeleekk

Ok, I had the same issue, but managed to load the file. My solution was just adding two "FALSE" to some flags:

my_obj <- LoadH5Seurat("my_obj.h5seurat", meta.data = FALSE, misc = FALSE)

The initial error was this:

Validating h5Seurat file
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing RNA with data
Adding counts for RNA
Adding feature-level metadata for RNA
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing combat_corrected with data
Adding counts for combat_corrected
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing exon_reads with data
Adding counts for exon_reads
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing exon_umis with data
Adding counts for exon_umis
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing intron_reads with data
Adding counts for intron_reads
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing intron_umis with data
Adding counts for intron_umis
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing starcounts with data
Adding counts for starcounts
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing umi_merge with data
Adding counts for umi_merge
Adding reduction pca
Adding cell embeddings for pca
Adding feature loadings for pca
Adding miscellaneous information for pca
Adding reduction umap
Adding cell embeddings for umap
Adding miscellaneous information for umap
Adding command information
Adding cell-level metadata
Error: Missing required datasets 'levels' and 'values'

When I add false to metadata, the error " Missing required datasets 'levels' and 'values' was "solved", but another appears: Error in if (!x[[i]]$dims) { : argument is of length zero And this was solved with false to "misc". Honestly, I don't know what it was about, but at least I got my object. Hope this could help someone :)

--Update--

Ok, obvs that when you're setting meta.data to false, some metadata will not load. But I figured out, that it may be only "pure" metadata, stored in "obs". So, here are some addition to loading it with a different package:

library(rhdf5)
my_obj[["mised_meta_value"]] <- h5read("my_obj.h5ad", "/obs/mised_meta_value")

And if you want to see the structure of the h5ad file (with R), you can use ls-like command, also from this package:

h5ls("my_obj.h5ad")

So it seems that from this point only ''misc" information is missing, but in my case, there is no info.

gleb-gavrish avatar May 02 '22 08:05 gleb-gavrish

Alternativly, you can use zellkonverter to read in your anndata as a SingleCellExperiment with
ad <- readH5AD(path_to/example_h5ad) Then you can use Seurat's function as.Seurat() to convert your object to Seurat. I also had to specify the default parameter counts and data to fit my data. E.g. i had to specify adata_Seurat <- as.Seurat(ad, counts = "X", data = NULL) You can find the name of your counts by omiting ad and look under the column assays. If you have logcounts you need to do the same for data and reference the correct column. I hope that helps! :)

tomthun avatar May 03 '22 10:05 tomthun

Same issue, but it looks like the package is not maintained anymore.

naity2 avatar May 05 '22 20:05 naity2

I have been struggling against it for months. That piss me off. Could anyone troubleshoot that? It's unbelievable I'm still stuck on it.

bclopesrs avatar May 25 '22 19:05 bclopesrs

I have been struggling against it for months. That piss me off. Could anyone troubleshoot that? It's unbelievable I'm still stuck on it.

As an alternative you can use the anndata for R package and build the seurat object from that

rockdeme avatar May 25 '22 20:05 rockdeme

Worked around this with @rockdeme's suggestion. Just use anndata, you'll have to get reticulate and other dependencies but it's worth it.

Example:

library("Seurat")
library("anndata")
print("Convert from Scanpy to Seurat...")
data <- read_h5ad("example.hd5ad")
data <- CreateSeuratObject(counts = t(data$X), meta.data = data$obs)
print(str(data))

AmirAlavi avatar May 25 '22 20:05 AmirAlavi

Thank you guys for your support but...

Still getting the same error: "Missing required datasets 'levels' and 'values"

Here is my code:

Convert("full_integrated_test.h5ad", dest = "h5seurat", overwrite = TRUE, verbose = T) test01 <- LoadH5Seurat("full_integrated_test.h5seurat", array = 'RNA') test01

bettega = LoadH5Seurat("full_integrated_test.h5seurat") sce = as.SingleCellExperiment(bettega) scConf = createConfig(sce) makeShinyApp(sce, scConf, gene.mapping = TRUE, shiny.title = "ShinyCell Quick Start")

bclopesrs avatar May 25 '22 20:05 bclopesrs

I got the same issue. I saw everyone mentioning anndata version and so I tried 0.7.5 and seems to be no issue for now.

Same issue here, using anndata 0.7.5 in python 3.9.13 resolved the issue.

YY-SONG0718 avatar Jul 19 '22 12:07 YY-SONG0718

I tried seuratdisk and sceasy, all gave me error messages

@tomthun's method worked for me.

use zellkonverter to read in your anndata as a SingleCellExperiment, then convert SCE to seurat worked for me.

library(zellkonverter) sce1=readH5AD("my.h5ad", verbose = TRUE) adata_Seurat <- as.Seurat(sce1, counts = "X", data = NULL)

gt7901b avatar Aug 06 '22 00:08 gt7901b

Try this: In scanpy, del adata.var del adata.obs then save h5ad

In R, then use Convert to convert h5ad to seurat

This method worked for me.

You can then add meta data in R

niehu2018 avatar Aug 20 '22 08:08 niehu2018

I'm having the same issue converting an anndata h5ad that came from version 0.7.8. @JBreunig what version of anndata is your postdoc using?

0.7.6 and 1.8.1

Had the same problem, and only this solution works (downgrade anndata to 0.7.6 and rewrite the h5ad file). Actually, I have had to do 2 lots of transformation before re-writing to get h5ad file that seurat likes - following the solution for this thread: https://github.com/satijalab/seurat/issues/1689

adata.T.T.write_h5ad("test.h5ad")

ghar1821 avatar Sep 30 '22 04:09 ghar1821

https://github.com/mojaveazure/seurat-disk/issues/109#issuecomment-1137812860

Hey,

So I had the same problem as you had and I was able to fix in a different way than downgrading anndata or changing package.

I did it like this,

Python -

Write the anndata meta data into csv formate

peri.write_csvs("peri", skip_data=False)

Write the anndata to h5ad

peri.write_h5ad("~/peri.h5ad")

R (Seurat)

Convert h5ad to h5seurat

Convert("~/Data/scRNA/merged/peri.h5ad", dest="h5seurat", overwrite = TRUE)

peri_meta <- read.csv("~/peri/obs.csv")

Making metadata rownames to cell barcodes

rownames(peri_meta) <- peri_meta$X

Selecting seats from column 1 to last except cell barcodes

peri_meta <- peri_meta[,c(2:146)]

then... load Seurat object with previous suggestion by @gleb-gavrish

peri = LoadH5Seurat("~/peri.h5seurat", meta.data = FALSE, misc = FALSE)

###This still does not work ###peri[["Condition"]] <- h5read("~/peri.h5ad", "/obs/Condition")

Finally add the metadata to the object

peri <- AddMetaData(
  object = peri,
  metadata = peri_meta)

Cheers, Sonik

pinunQ avatar Nov 09 '22 01:11 pinunQ

Worked around this with @rockdeme's suggestion. Just use anndata, you'll have to get reticulate and other dependencies but it's worth it.

Example:

library("Seurat")
library("anndata")
print("Convert from Scanpy to Seurat...")
data <- read_h5ad("example.hd5ad")
data <- CreateSeuratObject(counts = t(data$X), meta.data = data$obs)
print(str(data))

That works simply, many thanks.

UboCA avatar Nov 21 '22 14:11 UboCA

I also had trouble like this. I recommend downloading the github, modifying the script to add browser() calls to where the error comes from/debug in place, and using devtools::load_all to reload any modifications to the code to see if it works

For me, the main issue which causes this same error was using scanpy categories. I don't understand why some of categorical variables use categories+codes and others use levels+values as the h5 name but it was fixable. Additionally, paga networks weren't supported and I didn't need them so I disabled that

diff --git a/R/ReadH5.R b/R/ReadH5.R
index 4c169de..6075020 100644
--- a/R/ReadH5.R
+++ b/R/ReadH5.R
@@ -145,6 +145,14 @@ setMethod(
   f = 'as.factor',
   signature = c('x' = 'H5Group'),
   definition = function(x) {
+    if (x$exists(name = 'categories') && x$exists(name = 'codes')) {
+      # stop("Missing required datasets 'levels' and 'codes'", call. = FALSE)
+      ret = as.factor(x[["codes"]][])
+      levels(ret) = x[["categories"]][]
+      print(length(ret))
+      return(ret)
+      # arguments imply differing number of rows: 770951, 5965
+    }
     if (!x$exists(name = 'levels') || !x$exists(name = 'values')) {
       stop("Missing required datasets 'levels' and 'values'", call. = FALSE)
     }
@@ -245,7 +253,11 @@ setMethod(
         } else if (IsMatrix(x = x[[i]])) {
           as.matrix(x = x[[i]], ...)
         } else {
-          as.list(x = x[[i]], ...)
+          if(i == "paga"){
+            list("nada")
+          }else{
+            as.list(x = x[[i]], ...)
+          }
         }
       }
     }

JZL avatar Feb 07 '23 05:02 JZL

I found that any string column in the obs DataFrame will cause this issue, I drop all string columns and re-add them in Seurat by R language.

cchd0001 avatar Apr 26 '23 06:04 cchd0001

@UboCA @rockdeme and @AmirAlavi -- the t in the CreateSeuratObject()? This step threw an error saying my adata$X is not a matrix. Do I need to convert the dgRMatrix to matrix?

Thank you

adata_Seurat <- CreateSeuratObject(counts = t(adata$X), meta.data = cadata$obs)

Error in t.default(count302_Ton230240246_CD8R5posneg_chrMTGTF_concat_adata$X) : 
  argument is not a matrix

denvercal1234GitHub avatar May 04 '23 17:05 denvercal1234GitHub

@ghar1821 - Did you mean in Python, you downgraded anndata to 0.7.6, then do adata.T.T.write_h5ad before using Convert and LoadH5Seurat in R?

I did that, and in R, when I LoadH5Seurat, it said "Warning: Invalid name supplied, making object name syntactically valid. New object name is ClustersX_XX_Ybatch; see ?make.names for more details on syntax validityAdding miscellaneous information. Adding tool-specific results." Do you know if this is normal?

BTW, if doing just adata.write_h5ad appeared to produce the same result as with T.T.

> adata_Seurat <- LoadH5Seurat("........_Objects/concat_adata.h5seurat")

Validating h5Seurat file
Initializing RNA with data
Adding counts for RNA
Adding feature-level metadata for RNA
Initializing ambiguous with data
Adding counts for ambiguous
Initializing matrix with data
Adding counts for matrix
Initializing spliced with data
Adding counts for spliced
Initializing unspliced with data
Adding counts for unspliced
Adding command information
Adding cell-level metadata
Warning: Invalid name supplied, making object name syntactically valid. New object name is ClustersX_XX_Ybatch; see ?make.names for more details on syntax validityAdding miscellaneous information
Adding tool-specific results

I'm having the same issue converting an anndata h5ad that came from version 0.7.8. @JBreunig what version of anndata is your postdoc using?

0.7.6 and 1.8.1

Had the same problem, and only this solution works (downgrade anndata to 0.7.6 and rewrite the h5ad file). Actually, I have had to do 2 lots of transformation before re-writing to get h5ad file that seurat likes - following the solution for this thread: satijalab/seurat#1689

adata.T.T.write_h5ad("test.h5ad")

denvercal1234GitHub avatar May 04 '23 17:05 denvercal1234GitHub

hi @pinunQ - peri = LoadH5Seurat("~/peri.h5seurat", meta.data = FALSE, misc = FALSE) still gave the same error. Do you know why?

denvercal1234GitHub avatar May 04 '23 17:05 denvercal1234GitHub

Ok, I had the same issue, but managed to load the file. My solution was just adding two "FALSE" to some flags:

my_obj <- LoadH5Seurat("my_obj.h5seurat", meta.data = FALSE, misc = FALSE)

The initial error was this:

Validating h5Seurat file
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing RNA with data
Adding counts for RNA
Adding feature-level metadata for RNA
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing combat_corrected with data
Adding counts for combat_corrected
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing exon_reads with data
Adding counts for exon_reads
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing exon_umis with data
Adding counts for exon_umis
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing intron_reads with data
Adding counts for intron_reads
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing intron_umis with data
Adding counts for intron_umis
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing starcounts with data
Adding counts for starcounts
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Initializing umi_merge with data
Adding counts for umi_merge
Adding reduction pca
Adding cell embeddings for pca
Adding feature loadings for pca
Adding miscellaneous information for pca
Adding reduction umap
Adding cell embeddings for umap
Adding miscellaneous information for umap
Adding command information
Adding cell-level metadata
Error: Missing required datasets 'levels' and 'values'

When I add false to metadata, the error " Missing required datasets 'levels' and 'values' was "solved", but another appears: Error in if (!x[[i]]$dims) { : argument is of length zero And this was solved with false to "misc". Honestly, I don't know what it was about, but at least I got my object. Hope this could help someone :)

--Update--

Ok, obvs that when you're setting meta.data to false, some metadata will not load. But I figured out, that it may be only "pure" metadata, stored in "obs". So, here are some addition to loading it with a different package:

library(rhdf5)
my_obj[["mised_meta_value"]] <- h5read("my_obj.h5ad", "/obs/mised_meta_value")

And if you want to see the structure of the h5ad file (with R), you can use ls-like command, also from this package:

h5ls("my_obj.h5ad")

So it seems that from this point only ''misc" information is missing, but in my case, there is no info.

Hi, thanks for your solution. I think this is an inherent incompatibility between Seurat and Anndata. When Anndata write h5ad file using write_h5ad, it will convert all string variables into categorical variables, which will be processed in Seurat as a factor. LoadH5Seurat will call an internal function as.Seurat in R, in which as.factor function in R is called. as.factor function is expecting to work on a vector, but the as.Seurat input a data type called HDF5 group into as.factor, which cause the problem. I think the team should fix the problem.

shuailinli avatar May 20 '23 20:05 shuailinli

For someone's information, installing anndata==0.7.5 worked for me, as mentioned before.

st-tky avatar May 24 '23 15:05 st-tky

I was able to use @gleb-gavrish's solution and then add meta data like so (accounting for NA's in some of my categorical obs columns):

library(rhdf5)
sobj <- LoadH5Seurat(file="my_object.h5seurat",  meta.data = FALSE, misc = FALSE)
obs <- h5read("my_object.h5seurat", "/meta.data")

meta <- data.frame(lapply(names(obs), function(x) { 
  if (length(obs[[x]])==2) 
    obs[[x]][['categories']][ifelse(obs[[x]][['codes']] >= 0, obs[[x]][['codes']] + 1, NA)]
  else 
    as.numeric(obs[[x]])
}
), row.names=Cells(sobj))
colnames(meta) <- names(obs)

sobj <- AddMetaData(sobj,meta)

bobermayer avatar Jun 05 '23 11:06 bobermayer

Worked around this with @rockdeme's suggestion. Just use anndata, you'll have to get reticulate and other dependencies but it's worth it.

Example:

library("Seurat")
library("anndata")
print("Convert from Scanpy to Seurat...")
data <- read_h5ad("example.hd5ad")
data <- CreateSeuratObject(counts = t(data$X), meta.data = data$obs)
print(str(data))

This didn't work for me (anndata v0.7.5.2, Seurat v4.1.0, R v4.0.3), but the solution of using zellconverter -- suggested by @tomthun earlier in this thread (https://github.com/mojaveazure/seurat-disk/issues/109#issuecomment-1115959604) -- does the job nicely.

george-hall-ucl avatar Jun 27 '23 10:06 george-hall-ucl