squidpy icon indicating copy to clipboard operation
squidpy copied to clipboard

added design matrix and plotting function

Open LLehner opened this issue 2 years ago • 8 comments

IMPORTANT: Please search among the Pull requests before creating one.

Description

  • Added function to build design matrix containing distances to anchor points which can be used for plotting and model building
  • Added plotting function to visualize gene expression by (normalized) distance to anchor points

How has this been tested?

Tested on two data sets

Closes

LLehner avatar Aug 15 '22 00:08 LLehner

Codecov Report

Merging #591 (d942632) into main (0cd835d) will increase coverage by 0.07%. The diff coverage is 80.38%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #591      +/-   ##
==========================================
+ Coverage   78.56%   78.63%   +0.07%     
==========================================
  Files          31       33       +2     
  Lines        4492     4699     +207     
  Branches      865      917      +52     
==========================================
+ Hits         3529     3695     +166     
- Misses        708      732      +24     
- Partials      255      272      +17     
Impacted Files Coverage Δ
squidpy/pl/_graph.py 79.20% <33.33%> (-2.62%) :arrow_down:
squidpy/tl/_var_by_distance.py 78.51% <78.51%> (ø)
squidpy/pl/_var_by_distance.py 88.05% <88.05%> (ø)
squidpy/gr/_sepal.py 52.63% <100.00%> (+0.35%) :arrow_up:

codecov-commenter avatar Aug 15 '22 00:08 codecov-commenter

hi @LLehner

thanks a lot for this PR! looks good! Couple of things before I code review it:

  • can you create a new module (folder) named tl
  • can you add the _design_matrix.py file there? and rename it to _exp_dist.py and also the function rename it to exp_dist.
  • can you rename the file for plotting to _exp_dist.py .

Thank you! Looking forward to add this to Squidpy!

giovp avatar Aug 15 '22 09:08 giovp

@LLehner I added couple of TODOs on teh function and started skeleton of plotting test. Test for the function itself also should be added.

giovp avatar Aug 18 '22 19:08 giovp

I think imports are still wrong, see error

n:/home/runner/.dotnet/tools:/snap/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
  [2819] /home/runner/work/squidpy/squidpy$ /home/runner/work/squidpy/squidpy/.tox/py310-linux/bin/python -m pytest --cov --cov-append --cov-report=term-missing --cov-config=/home/runner/work/squidpy/squidpy/tox.ini --ignore docs/ -vv --test-napari
  ImportError while loading conftest '/home/runner/work/squidpy/squidpy/tests/conftest.py'.
  tests/conftest.py:23: in <module>
      from squidpy.gr import spatial_neighbors
  squidpy/__init__.py:1: in <module>
      from squidpy import gr, im, pl, read, datasets
  squidpy/pl/__init__.py:13: in <module>
      from squidpy.pl._feature_by_dist import plot_gexp_dist
  E   ModuleNotFoundError: No module named 'squidpy.pl._feature_by_dist'

giovp avatar Aug 18 '22 19:08 giovp

@giovp I did apply the changes, there were some minor things i had to change though. It's on the spatialde repo.

LLehner avatar Sep 17 '22 15:09 LLehner

@giovp I did apply the changes, there were some minor things i had to change though. It's on the spatialde repo.

what type of changes?

giovp avatar Sep 17 '22 15:09 giovp

This fails if i want to compute distances on subset of adata only. This is plot of spatial of full data: image I then subset the data to 10th of the whole data, retaining balanced cell type proportions image

Is it possible that this has to do sth with _prune_anchor_tree which has hard coded parameters?

  exp_dist(adata=adata_subset,
      groups='CK+ HR+ tumor cell',
      cluster_key='cell type',
      design_matrix_key = "design_matrix",
      batch_key = None,
      covariates = None,
      metric = "euclidean",
      copy = True)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [35], line 5
      3 print(adata[adata.obs.slide==slide,:])
      4 display(adata[adata.obs.slide==slide,:].obs['cell type'].value_counts())
----> 5 exp_dist(adata=adata[adata.obs.slide==slide,:].copy(),
      6     groups='CK+ HR+ tumor cell',
      7     cluster_key='cell type',
      8     design_matrix_key = "design_matrix",
      9     batch_key = None,
     10     covariates = None,
     11     metric = "euclidean",
     12     copy = True)

File ~/Documents/GitHub/spatial-de-2022/spatialde/functions/exp_dist.py:99, in exp_dist(adata, groups, cluster_key, design_matrix_key, batch_key, covariates, spatial_key, metric, copy)
     95     anchor_coord, batch_coord = _get_coordinates(adata, anchor_var, cluster_key)
     97 anchor_coord = _prune_anchor_tree(anchor_coord, 0.05, 4, metric)
---> 99 tree = KDTree(anchor_coord, metric=DistanceMetric.get_metric(metric))
    100 mindist, _ = tree.query(batch_coord)
    102 if isinstance(anchor_var, np.ndarray):

File sklearn/neighbors/_binary_tree.pxi:833, in sklearn.neighbors._kd_tree.BinaryTree.__init__()

File ~/opt/miniconda3/envs/spatial-de-2022/lib/python3.8/site-packages/sklearn/utils/validation.py:909, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    907     n_samples = _num_samples(array)
    908     if n_samples < ensure_min_samples:
--> 909         raise ValueError(
    910             "Found array with %d sample(s) (shape=%s) while a"
    911             " minimum of %d is required%s."
    912             % (n_samples, array.shape, ensure_min_samples, context)
    913         )
    915 if ensure_min_features > 0 and array.ndim == 2:
    916     n_features = array.shape[1]

ValueError: Found array with 0 sample(s) (shape=(0, 2)) while a minimum of 1 is required.

Hrovatin avatar Sep 21 '22 08:09 Hrovatin

Distances dataframe result also contains the ref cell types, not all of which have distance equal to 0.

ref_ct='T cells'
distances=exp_dist(adata=adata,
    groups=ref_ct,
    cluster_key='cell type',
    design_matrix_key = "design_matrix",
    batch_key = None, # Currently not working on mock data
    covariates = None,
    metric = "euclidean",
    copy = True)

print('N ref cells:',distances.query('`cell type`=="T cells"').shape[0])
display(distances.query('`cell type`=="T cells"')[
    distances.query('`cell type`=="T cells"')[ref_ct]>0])

cells: 290
image

Hrovatin avatar Sep 21 '22 15:09 Hrovatin

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@LLehner am fixing pre-commits in #643 but tests still fails

giovp avatar Feb 15 '23 15:02 giovp

@LLehner seems like tests are failing because of

  .tox/py/lib/python3.9/site-packages/_pytest/assertion/rewrite.py:168: in exec_module
      exec(co, module.__dict__)
  tests/graph/test_design_matrix.py:9: in <module>
      from squidpy.tl.exp_dist import exp_dist
  E   ModuleNotFoundError: No module named 'squidpy.tl.exp_dist'

this is because you need to export exp_dist from tl init.py file, see other modules for reference

giovp avatar Feb 17 '23 08:02 giovp

linting is instead failing because of pre-commits, I believe you'd have to rerun them but ruff should do modiy in place most of the stuff

giovp avatar Feb 17 '23 08:02 giovp

@LLehner can you join zulip https://scverse.zulipchat.com/ helmholtz services are all down, will explain there.

giovp avatar Mar 16 '23 10:03 giovp

also consider to clean up the docstrings by using stuff that we have in docrep already

giovp avatar Mar 28 '23 15:03 giovp