safepy icon indicating copy to clipboard operation
safepy copied to clipboard

sf.define_domains() - ValueError: The number of observations cannot be determined on an empty distance matrix.

Open m-petersen opened this issue 2 months ago • 3 comments

Hi,

thanks for this great package! I am trying to use SAFE for the annotation of metadata on a low-dimensional graph representation of microbiome abundance information (similar to: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1871-4). For this I have used Mapper to obtain a graph representation of the microbiome information (nodes represent subject groups, edges represent overlapping subjects between the nodes). Now, I want to annotate this graph with node-level information on age, sex and other covariates with safepy. Loading the graph and annotation information works fine and I can also obtain the enrichment landscapes of individual covariates with sf.plot_sample_attributes().

However, if I want to plot a composite landscape I get an error I do not understand completely. sf.define_top_attributes() runs without an error. sf.define_domains() results in the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[/tmp/ipykernel_6330/3930251677.py](https://file+.vscode-resource.vscode-cdn.net/tmp/ipykernel_6330/3930251677.py) in 
----> 1 sf.define_domains(attribute_distance_threshold = 0.65)

[~/miniconda3/envs/brainstat/lib/python3.7/site-packages/safepy/safe.py](https://file+.vscode-resource.vscode-cdn.net/home/marvin/mount/hdd8tb/CSI_HCHS/2023_oral_microbiome/code/~/miniconda3/envs/brainstat/lib/python3.7/site-packages/safepy/safe.py) in define_domains(self, **kwargs)
    661 
    662         m = self.nes_binary[:, self.attributes['top']].T
--> 663         Z = linkage(m, method='average', metric=self.attribute_distance_metric)
    664         max_d = np.max(Z[:, 2] * self.attribute_distance_threshold)
    665         domains = fcluster(Z, max_d, criterion='distance')

[~/miniconda3/envs/brainstat/lib/python3.7/site-packages/scipy/cluster/hierarchy.py](https://file+.vscode-resource.vscode-cdn.net/home/marvin/mount/hdd8tb/CSI_HCHS/2023_oral_microbiome/code/~/miniconda3/envs/brainstat/lib/python3.7/site-packages/scipy/cluster/hierarchy.py) in linkage(y, method, metric, optimal_ordering)
   1066                          "finite values.")
   1067 
-> 1068     n = int(distance.num_obs_y(y))
   1069     method_code = _LINKAGE_METHODS[method]
   1070 

[~/miniconda3/envs/brainstat/lib/python3.7/site-packages/scipy/spatial/distance.py](https://file+.vscode-resource.vscode-cdn.net/home/marvin/mount/hdd8tb/CSI_HCHS/2023_oral_microbiome/code/~/miniconda3/envs/brainstat/lib/python3.7/site-packages/scipy/spatial/distance.py) in num_obs_y(Y)
   2570     k = Y.shape[0]
   2571     if k == 0:
-> 2572         raise ValueError("The number of observations cannot be determined on "
   2573                          "an empty distance matrix.")
   2574     d = int(np.ceil(np.sqrt(k * 2)))

ValueError: The number of observations cannot be determined on an empty distance matrix.

Based on my understanding of the code, no top nodes were identified with sf.define_top_attributes() based on my data as the criterion of 1 connected component was not met (in my analysis the metadata variables have more than 1 connected components). What I am aiming for is a composite contour plot to investigate based on the enrichment landscapes whether the dominance of certain microbes is linked to specific covariates. Is it possible to change some of the default parameters to obtain composite maps for my use case? Do you think the annotation of subject-level networks with your package is valid in general or am I missing something?

Your help would be highly appreciated. Many thanks in advance! You can find the corresponding jupyter notebook containing the code and the error via this link. :)

m-petersen avatar Apr 29 '24 13:04 m-petersen