scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

KeyError: 1 in read_10x_mtx if genes.tsv has only one column

Open brianpenghe opened this issue 3 years ago • 8 comments

I have a similar issue to this comment.

Carraro=sc.read_10x_mtx('/mnt/Carraro',var_names='gene_ids')

Switching to gene_symbols didn't work

Error messages:

--> This might be very slow. Consider passing `cache=True`, which enables much faster reading from a cache file.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/flng/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

~/miniconda3/envs/flng/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

~/miniconda3/envs/flng/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 1

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_29519/245170133.py in <module>
----> 1 Carraro=sc.read_10x_mtx('/mnt/Carraro',var_names='gene_ids')

~/miniconda3/envs/flng/lib/python3.8/site-packages/scanpy/readwrite.py in read_10x_mtx(path, var_names, make_unique, cache, cache_compression, gex_only)
    452     genefile_exists = (path / 'genes.tsv').is_file()
    453     read = _read_legacy_10x_mtx if genefile_exists else _read_v3_10x_mtx
--> 454     adata = read(
    455         str(path),
    456         var_names=var_names,

~/miniconda3/envs/flng/lib/python3.8/site-packages/scanpy/readwrite.py in _read_legacy_10x_mtx(path, var_names, make_unique, cache, cache_compression)
    491     elif var_names == 'gene_ids':
    492         adata.var_names = genes[0].values
--> 493         adata.var['gene_symbols'] = genes[1].values
    494     else:
    495         raise ValueError("`var_names` needs to be 'gene_symbols' or 'gene_ids'")

~/miniconda3/envs/flng/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3456             if self.columns.nlevels > 1:
   3457                 return self._getitem_multilevel(key)
-> 3458             indexer = self.columns.get_loc(key)
   3459             if is_integer(indexer):
   3460                 indexer = [indexer]

~/miniconda3/envs/flng/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 1

Any ideas?

brianpenghe avatar Nov 18 '21 23:11 brianpenghe

It seems to be something about the genes.tsv. I replaced it with another genes.tsv and it didn't produce errors.

brianpenghe avatar Nov 19 '21 00:11 brianpenghe

@brianpenghe, do you have a copy of your original file? Any idea what could have been different?

@dn-ra, would you be able to share the first couple lines of your file, and let me know how it was generated?

ivirshup avatar Dec 01 '21 17:12 ivirshup

@brianpenghe, do you have a copy of your original file? Any idea what could have been different?

@dn-ra, would you be able to share the first couple lines of your file, and let me know how it was generated?

I think I found the cause: When the genes.tsv only has one column it doesn't work and throws this error.

Thanks!

brianpenghe avatar Dec 02 '21 16:12 brianpenghe

@brianpenghe What column did you add to the genes.tsv so that it worked? I currently have a genes.tsv file with one column for the gene names and am getting the same error as you did. Thanks!

mboisvert1 avatar Jul 11 '22 21:07 mboisvert1

If that’s a case that can happen, we should deal with it. @brianpenghe please share a few lines of the file in a code block.

flying-sheep avatar Jul 12 '22 08:07 flying-sheep

In my case, there were three files: barcodes.tsv genes.tsv matrix.mtx What didn't work was a genes.tsv that looks like this:

AL627309.1
AL669831.5
LINC00115
FAM41C
AL645608.3
SAMD11
NOC2L
KLHL17
PLEKHN1
PERM1

What worked was a genes.tsv that looks like this:

ENSG00000243485	MIR1302-2HG
ENSG00000237613	FAM138A
ENSG00000186092	OR4F5
ENSG00000238009	AL627309.1
ENSG00000239945	AL627309.3
ENSG00000239906	AL627309.2
ENSG00000241599	AL627309.4
ENSG00000236601	AL732372.1
ENSG00000284733	OR4F29
ENSG00000235146	AC114498.1

So I had to import the data with the latter genes.tsv and then replaced the var.names with the correct genes.

I noticed that the sc.read_10x_mtx function can read both .gz or text formats and decide on their own what format they are. Whether the gene file name is genes.tsv or 'features.tsv' also matters.

Any ideas?

brianpenghe avatar Jul 13 '22 09:07 brianpenghe

I've fixed the error I was getting, which was posted on another issue and referenced here. Here's the solution that worked for me: https://github.com/scverse/scanpy/issues/1916#issuecomment-1286404697

dn-ra avatar Oct 21 '22 03:10 dn-ra