scanpy
scanpy copied to clipboard
KeyError: 1 in read_10x_mtx if genes.tsv has only one column
I have a similar issue to this comment.
Carraro=sc.read_10x_mtx('/mnt/Carraro',var_names='gene_ids')
Switching to gene_symbols
didn't work
Error messages:
--> This might be very slow. Consider passing `cache=True`, which enables much faster reading from a cache file.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/miniconda3/envs/flng/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
~/miniconda3/envs/flng/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/miniconda3/envs/flng/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 1
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/tmp/ipykernel_29519/245170133.py in <module>
----> 1 Carraro=sc.read_10x_mtx('/mnt/Carraro',var_names='gene_ids')
~/miniconda3/envs/flng/lib/python3.8/site-packages/scanpy/readwrite.py in read_10x_mtx(path, var_names, make_unique, cache, cache_compression, gex_only)
452 genefile_exists = (path / 'genes.tsv').is_file()
453 read = _read_legacy_10x_mtx if genefile_exists else _read_v3_10x_mtx
--> 454 adata = read(
455 str(path),
456 var_names=var_names,
~/miniconda3/envs/flng/lib/python3.8/site-packages/scanpy/readwrite.py in _read_legacy_10x_mtx(path, var_names, make_unique, cache, cache_compression)
491 elif var_names == 'gene_ids':
492 adata.var_names = genes[0].values
--> 493 adata.var['gene_symbols'] = genes[1].values
494 else:
495 raise ValueError("`var_names` needs to be 'gene_symbols' or 'gene_ids'")
~/miniconda3/envs/flng/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
3456 if self.columns.nlevels > 1:
3457 return self._getitem_multilevel(key)
-> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
3460 indexer = [indexer]
~/miniconda3/envs/flng/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 1
Any ideas?
It seems to be something about the genes.tsv
. I replaced it with another genes.tsv and it didn't produce errors.
@brianpenghe, do you have a copy of your original file? Any idea what could have been different?
@dn-ra, would you be able to share the first couple lines of your file, and let me know how it was generated?
@brianpenghe, do you have a copy of your original file? Any idea what could have been different?
@dn-ra, would you be able to share the first couple lines of your file, and let me know how it was generated?
I think I found the cause: When the genes.tsv only has one column it doesn't work and throws this error.
Thanks!
@brianpenghe What column did you add to the genes.tsv so that it worked? I currently have a genes.tsv file with one column for the gene names and am getting the same error as you did. Thanks!
If that’s a case that can happen, we should deal with it. @brianpenghe please share a few lines of the file in a code block.
In my case, there were three files:
barcodes.tsv
genes.tsv
matrix.mtx
What didn't work was a genes.tsv
that looks like this:
AL627309.1
AL669831.5
LINC00115
FAM41C
AL645608.3
SAMD11
NOC2L
KLHL17
PLEKHN1
PERM1
What worked was a genes.tsv
that looks like this:
ENSG00000243485 MIR1302-2HG
ENSG00000237613 FAM138A
ENSG00000186092 OR4F5
ENSG00000238009 AL627309.1
ENSG00000239945 AL627309.3
ENSG00000239906 AL627309.2
ENSG00000241599 AL627309.4
ENSG00000236601 AL732372.1
ENSG00000284733 OR4F29
ENSG00000235146 AC114498.1
So I had to import the data with the latter genes.tsv
and then replaced the var.names with the correct genes.
I noticed that the sc.read_10x_mtx
function can read both .gz or text formats and decide on their own what format they are. Whether the gene file name is genes.tsv
or 'features.tsv' also matters.
Any ideas?
I've fixed the error I was getting, which was posted on another issue and referenced here. Here's the solution that worked for me: https://github.com/scverse/scanpy/issues/1916#issuecomment-1286404697