STREAM
STREAM copied to clipboard
Map new data when using Seurat object
Hi, I have precomputed two seurat objects. After infering trajectory for one seurat object, I'd like to map cells from the other seurat object to the precomputed trajectory. However, there is an error when I run map_new_data() function :
adata_combined = st.map_new_data(WT_adata,Mutant_adata)
Top principal components are being used for mapping ...
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-56-f419d4836daf> in <module>
----> 1 adata_combined = st.map_new_data(WT_adata,Mutant_adata)
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/stream/core.py in map_new_data(adata_ref, adata_new, use_radius)
4059 print('Top principal components are being used for mapping ...')
4060 if('top_pcs' not in adata_ref.uns_keys()):
-> 4061 raise ValueError("top principal components are not selected yet in `adata_ref`")
4062 trans = adata_ref.uns['top_pcs']
4063 top_pcs_feature = adata_ref.uns['params']['select_top_principal_components']['feature']
ValueError: top principal components are not selected yet in `adata_ref`
The following is my code :
import anndata as ad
import stream as st
# Load the first seurat object
WT_adata = ad.read_loom('"./my_WT_object.loom")
st.set_workdir(WT_adata,'./stream_result')
WT_adata.obsm['top_pcs'] = WT_adata.obsm['pca_cell_embeddings']
WT_adata.obsm['X_vis_umap'] = WT_adata.obsm['umap_cell_embeddings']
st.dimension_reduction(WT_adata,method='mlle',feature='top_pcs',n_jobs=4,n_neighbors=30)
st.seed_elastic_principal_graph(WT_adata,n_clusters=10)
st.elastic_principal_graph(WT_adata,epg_alpha=0.05,epg_mu=0.05,epg_lambda=0.05,epg_trimmingradius=0.1)
st.extend_elastic_principal_graph(WT_adata)
st.plot_dimension_reduction(WT_adata,color=['label'],n_components=2,show_graph=True,show_text=True)
st.plot_branches(WT_adata,show_text=True)
# Load the second seurat object
Mutant_adata = ad.read_loom("./my_mutant_object.loom")
Mutant_adata.obsm['top_pcs'] = Mutant_adata.obsm['pca_cell_embeddings']
Mutant_adata.obsm['X_vis_umap'] = Mutant_adata.obsm['umap_cell_embeddings']
# mapping
adata_combined = st.map_new_data(WT_adata,Mutant_adata)
I tried to run select_top_principal_components() function for WT_adata, but there is another error :
st.select_top_principal_components(WT_adata,n_pc=15,first_pc=True)
using all the features ...
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-63-358562d189fb> in <module>
----> 1 st.select_top_principal_components(WT_adata,n_pc=15,first_pc=True)
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/stream/core.py in select_top_principal_components(adata, feature, n_pc, max_pc, first_pc, use_precomputed, save_fig, fig_name, fig_path, fig_size, pad, w_pad, h_pad)
1035 else:
1036 print('using all the features ...')
-> 1037 trans = sklearn_pca.fit(adata.X)
1038 X_pca = trans.transform(adata.X)
1039 # X_pca = sklearn_pca.fit_transform(adata.X)
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/sklearn/decomposition/_pca.py in fit(self, X, y)
350 Returns the instance itself.
351 """
--> 352 self._fit(X)
353 return self
354
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/sklearn/decomposition/_pca.py in _fit(self, X)
392 # This is more informative than the generic one raised by check_array.
393 if issparse(X):
--> 394 raise TypeError('PCA does not support sparse input. See '
395 'TruncatedSVD for a possible alternative.')
396
TypeError: PCA does not support sparse input. See TruncatedSVD for a possible alternative.
Could anyone please tell me how to solve this problem? Thanks a lot !
It looks like Mutant_adata.X
is a sparse matrix, which resulted in the error you are seeing. Unfortunately for the current version, sparse matrix is not supported yet.
One quick fix for it is to convert sparse matrices into dense matrices, i.e.
WT_adata.X = WT_adata.X.A
Mutant_adata.X = Mutant_adata.X.A
I tried to convert sparse matrices into dense matrices. However, I got the following error :
WT_adata.X = WT_adata.X.A
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-6-3d42edc1ccff> in <module>
1 ## convert sparse matrices into dense matrices
----> 2 WT_adata.X = WT_adata.X.A
AttributeError: 'numpy.ndarray' object has no attribute 'A'
Hmmm, they seem already numpy arrays, which kind of contradicts the previous error.. is there a dummy dataset you can share to reproduce the issue? I m happy to take a closer look.
Sorry for the late reply. I tried to rerun the scripts and found out that converting sparse matrices into dense matrices works. Howerver, I still encounter another problem when mapping data.
# mapping
adata_combined = st.map_new_data(WT_adata,Mutant_adata)
Top principal components are being used for mapping ...
method 'mlle' is being used for mapping ...
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-26-f419d4836daf> in <module>
----> 1 adata_combined = st.map_new_data(WT_adata,Mutant_adata)
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/stream/core.py in map_new_data(adata_ref, adata_new, use_radius)
4129 shared_var_key = [x for x in adata_new.var_keys() if x in adata_ref.var_keys()]
4130 adata_combined.obs = adata_combined.obs[shared_obs_key+['batch']]
-> 4131 adata_combined.var = adata_combined.var[shared_var_key]
4132 adata_combined.uns['workdir'] = adata_new.uns['workdir']
4133 for key in adata_ref.uns_keys():
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
3028 if is_iterator(key):
3029 key = list(key)
-> 3030 indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
3031
3032 # take() does not accept boolean indexers
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1264 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
1265
-> 1266 self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
1267 return keyarr, indexer
1268
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1306 if missing == len(indexer):
1307 axis_name = self.obj._get_axis_name(axis)
-> 1308 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1309
1310 ax = self.obj._get_axis(axis)
KeyError: "None of [Index(['Selected'], dtype='object')] are in the [columns]"
It seems that my adata contain something unexpected var after converting from Seurat object. Here is my adata :
WT_adata
AnnData object with n_obs × n_vars = 6936 × 17612
obs: 'ClusterID', 'ClusterName', 'integrated_snn_res_0_2', 'nCount_RNA', 'nFeature_RNA', 'orig_ident', 'percent_mt', 'rough_cell_type', 'seurat_clusters', 'kmeans', 'node', 'branch_id', 'branch_id_alias', 'branch_lam', 'branch_dist', 'S0_pseudotime', 'S1_pseudotime'
var: 'Selected'
uns: 'workdir', 'var_genes', 'pca_variance_ratio', 'top_pcs', 'params', 'trans_mlle', 'rough_cell_type_color', 'epg', 'flat_tree', 'seed_epg', 'seed_flat_tree', 'ori_epg', 'epg_obj', 'ori_epg_obj', 'branch_id_alias_color', 'stream_S1'
obsm: 'pca_cell_embeddings', 'umap_cell_embeddings', 'top_pcs', 'X_vis_umap', 'var_genes', 'pca', 'X_mlle', 'X_dr', 'X_spring', 'X_stream_S1'
varm: 'pca_feature_loadings'
layers: 'scale_data'
Mutant_adata
AnnData object with n_obs × n_vars = 7216 × 17612
obs: 'ClusterID', 'ClusterName', 'integrated_snn_res_0_2', 'nCount_RNA', 'nFeature_RNA', 'orig_ident', 'percent_mt', 'rough_cell_type', 'seurat_clusters'
var: 'Selected'
obsm: 'pca_cell_embeddings', 'umap_cell_embeddings', 'top_pcs', 'X_vis_umap'
varm: 'pca_feature_loadings'
layers: 'scale_data'
The two anndata objects both look fine to me. It's difficult to understand what's going on without knowing the output of Mutant_adata.var
and Mutant_adata.var_keys()
. Not sure if you can share a subset of both anndata objects. it would be much easier for me to give some more useful feedback on it.
Here are a subset of WT_adata and Mutant_adata which both downsampled to 1000 cells. You can download the data from this url : https://mab.to/5HqGKVtVt The following is my code :
import anndata as ad
import stream as st
WT_adata = ad.read_loom("Seurat_1000cells_integrated_WT_germline_scRNA.loom")
## convert sparse matrices into dense matrices
WT_adata.X = WT_adata.X.A
## Feature selection
st.select_variable_genes(WT_adata,loess_frac=0.01, n_genes=2000)
## select top principal components using variable genes
st.select_top_principal_components(WT_adata,n_pc=30,feature='var_genes',first_pc=False)
WT_adata.obsm['top_pcs'] = WT_adata.obsm['pca_cell_embeddings']
## dimension_reduction
st.dimension_reduction(WT_adata,method='mlle',feature='top_pcs',n_jobs=4,n_neighbors=30)
st.plot_dimension_reduction(WT_adata,color=['rough_cell_type'],n_components=2,show_graph=False,show_text=False)
## Trajectory inference
# Seeding the initial elastic principal graph
st.seed_elastic_principal_graph(WT_adata,n_clusters=10)
st.elastic_principal_graph(WT_adata,epg_alpha=0.05,epg_mu=0.05,epg_lambda=0.05,epg_trimmingradius=0.1)
#Extend leaf branch to reach further cells
st.extend_elastic_principal_graph(WT_adata)
## mapping
Mutant_adata = ad.read_loom("Seurat_1000cells_integrated_Mutant_germline_scRNA.loom")
Mutant_adata.obsm['top_pcs'] = Mutant_adata.obsm['pca_cell_embeddings']
# convert sparse matrices into dense matrices
Mutant_adata.X = Mutant_adata.X.A
adata_combined = st.map_new_data(WT_adata,Mutant_adata)
Thanks a lot for your help!
Hi,
Thanks for sharing the data. I was able to reproduce the error.
Basically the error was caused by the auto-change of variable keys of the same name when concatenating two anndata objects (The same variable key Selected
is present in both WT_adata.var
and Mutant_adata.var
).
A simple workaround would be to rename either one of them, e.g.
Mutant_adata = ad.read_loom("./Seurat_1000cells_integrated_Mutant_germline_scRNA.loom")
st.set_workdir(WT_adata, './stream_result')
st.set_workdir(Mutant_adata, './stream_mapping_result')
Mutant_adata.var.rename(columns={"Selected": "Selected_mutant"},inplace=True)
Mutant_adata.obsm['top_pcs'] = Mutant_adata.obsm['pca_cell_embeddings']
# convert sparse matrices into dense matrices
Mutant_adata.X = Mutant_adata.X.A
adata_combined = st.map_new_data(WT_adata,Mutant_adata)
The above code works for me.
Thanks for your help and sorry to bother you again. I can sucessfully run map_new_data()
function successfully now. However, when I tired to genereate the dimension reduction plot , there was another error :
st.plot_flat_tree(adata_combined,color=['batch'],show_graph=True,show_text=True)
TypeError Traceback (most recent call last)
<ipython-input-68-66a7b0482b27> in <module>
1 ## since we learnt graph on visualization manifold, the dimension reduction plot will be the same as the visualization plot
----> 2 st.plot_flat_tree(adata_combined,color=['batch'],show_graph=True,show_text=True)
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/stream/core.py in plot_flat_tree(adata, color, dist_scale, fig_size, fig_ncol, fig_legend_ncol, fig_legend_order, vmin, vmax, alpha, pad, w_pad, h_pad, show_text, show_graph, save_fig, fig_path, fig_name, plotly)
2595 # ax_i.get_legend().texts[0].set_text("")
2596 else:
-> 2597 vmin_i = df_plot[ann].min() if vmin is None else vmin
2598 vmax_i = df_plot[ann].max() if vmax is None else vmax
2599 sc_i = ax_i.scatter(df_plot_shuf['FlatTree1'], df_plot_shuf['FlatTree2'],
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/generic.py in min(self, axis, skipna, level, numeric_only, **kwargs)
11211 )
11212 def min(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs):
> 11213 return NDFrame.min(self, axis, skipna, level, numeric_only, **kwargs)
11214
11215 # pandas\core\generic.py:11009: error: Cannot assign to a method
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/generic.py in min(self, axis, skipna, level, numeric_only, **kwargs)
10715 def min(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs):
10716 return self._stat_function(
> 10717 "min", nanops.nanmin, axis, skipna, level, numeric_only, **kwargs
10718 )
10719
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/generic.py in _stat_function(self, name, func, axis, skipna, level, numeric_only, **kwargs)
10710 return self._agg_by_level(name, axis=axis, level=level, skipna=skipna)
10711 return self._reduce(
> 10712 func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
10713 )
10714
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
4171 if isinstance(delegate, ExtensionArray):
4172 # dispatch to ExtensionArray interface
-> 4173 return delegate._reduce(name, skipna=skipna, **kwds)
4174
4175 else:
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/arrays/_mixins.py in _reduce(self, name, skipna, **kwargs)
271 meth = getattr(self, name, None)
272 if meth:
--> 273 return meth(skipna=skipna, **kwargs)
274 else:
275 msg = f"'{type(self).__name__}' does not implement reduction '{name}'"
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
197 else:
198 kwargs[new_arg_name] = new_arg_value
--> 199 return func(*args, **kwargs)
200
201 return cast(F, wrapper)
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in min(self, skipna, **kwargs)
1957 nv.validate_minmax_axis(kwargs.get("axis", 0))
1958 nv.validate_min((), kwargs)
-> 1959 self.check_for_ordered("min")
1960
1961 if not len(self._codes):
/public/home/shipy3/miniconda3/envs/env_stream/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in check_for_ordered(self, op)
1435 if not self.ordered:
1436 raise TypeError(
-> 1437 f"Categorical is not ordered for operation {op}\n"
1438 "you can use .as_ordered() to change the "
1439 "Categorical to an ordered one\n"
TypeError: Categorical is not ordered for operation min
you can use .as_ordered() to change the Categorical to an ordered one
I tried to run adata_combined.obs['batch'].as_ordered()
to change batch colunm to an ordered one. However, it stiil failed to generate dimension reduction plot. The link to download WT_adata and Mutant_adata is changed to https://mab.to/oWopBLhAh
No problem. I'm glad the above solution worked out for you.
Re the plotting error, unfortunately I was not able to reproduce it. With the same dataset you shared previously, the same function works for me.
But it seems to have something to do with the data type. Can you change the data type from category
to str
with the following code and give it a try again?
adata_combined.obs['batch'] = adata_combined.obs['batch'].astype(str)