scaden icon indicating copy to clipboard operation
scaden copied to clipboard

Simulate errors out when celltype names are only numbers, requires text prefix to run correctly.

Open nagendraKU opened this issue 3 years ago • 1 comments

I ran scaden simulate with celltype names being the Leiden cluster numbers. Got the following error message and the data.h5ad file was not created.

INFO Datasets: ['testdata_all_bat'] bulk_simulator.py:84 INFO Simulating data from testdata_all_bat bulk_simulator.py:89 INFO Loading testdata_all_bat dataset ... bulk_simulator.py:141 INFO Merging unknown cell types: ['unknown'] bulk_simulator.py:107 INFO Subsampling testdata_all_bat ... bulk_simulator.py:110 /home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_core/anndata.py:120: ImplicitModificationWarning: Transforming to str index. warnings.warn("Transforming to str index.", ImplicitModificationWarning) ... storing 'ds' as categorical Traceback (most recent call last): File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/utils.py", line 209, in func_wrapper return func(elem, key, val, *args, **kwargs) File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/h5ad.py", line 247, in write_dataframe col_names = [check_key(c) for c in df.columns] File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/h5ad.py", line 247, in col_names = [check_key(c) for c in df.columns] File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/utils.py", line 109, in check_key raise TypeError(f"{key} of type {typ} is an invalid key. Should be str.") TypeError: 0 of type <class 'int'> is an invalid key. Should be str.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ku_user/scadendl/bin/scaden", line 8, in sys.exit(main()) File "/home/ku_user/scadendl/lib64/python3.6/site-packages/scaden/main.py", line 48, in main cli() File "/home/ku_user/scadendl/lib64/python3.6/site-packages/click/core.py", line 1137, in call return self.main(*args, **kwargs) File "/home/ku_user/scadendl/lib64/python3.6/site-packages/click/core.py", line 1062, in main rv = self.invoke(ctx) File "/home/ku_user/scadendl/lib64/python3.6/site-packages/click/core.py", line 1668, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ku_user/scadendl/lib64/python3.6/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/ku_user/scadendl/lib64/python3.6/site-packages/click/core.py", line 763, in invoke return __callback(*args, **kwargs) File "/home/ku_user/scadendl/lib64/python3.6/site-packages/scaden/main.py", line 215, in simulate fmt=data_format, File "/home/ku_user/scadendl/lib64/python3.6/site-packages/scaden/simulate.py", line 22, in simulation bulk_simulator.simulate() File "/home/ku_user/scadendl/lib64/python3.6/site-packages/scaden/simulation/bulk_simulator.py", line 90, in simulate self.simulate_dataset(dataset) File "/home/ku_user/scadendl/lib64/python3.6/site-packages/scaden/simulation/bulk_simulator.py", line 130, in simulate_dataset ann_data.write(os.path.join(self.out_dir, dataset + ".h5ad")) File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_core/anndata.py", line 1911, in write_h5ad as_dense=as_dense, File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/h5ad.py", line 111, in write_h5ad write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs) File "/usr/lib64/python3.6/functools.py", line 807, in wrapper return dispatch(args[0].class)(*args, **kw) File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad _write_method(type(value))(f, key, value, *args, **kwargs) File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/utils.py", line 216, in func_wrapper ) from e TypeError: 0 of type <class 'int'> is an invalid key. Should be str.

Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /.

I then appended "celltype_" to the Leiden cluster numbers (eg: celltype_13) in the celltype file, and simulate runs correctly, generating the data.h5ad file. I still get the following warning message though.

/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_core/anndata.py:120: ImplicitModificationWarning: Transforming to str index. warnings.warn("Transforming to str index.", ImplicitModificationWarning) ... storing 'ds' as categorical

nagendraKU avatar Jul 02 '21 11:07 nagendraKU

Hi @nagendraKU ,

thanks for reporting that. Yes using only numbers can cause problem - I will try to catch that and issue a better warning. You can ignore the ImplicitModificationWarning though, that shouldn't cause a problem.

KevinMenden avatar Jul 02 '21 12:07 KevinMenden