graspologic icon indicating copy to clipboard operation
graspologic copied to clipboard

[BUG] VNviaSGM doesn't seem to work with the paper example

Open ebridge2 opened this issue 2 years ago • 0 comments

Expected Behavior

The example in the paper involves finding nominees for a node in G1 amongst nodes in G2, where n1 > n2. When I construct an example that does approximately this:

from graspologic.simulations import er_np

nsoc = 50
p = 0.5
Asoc = er_np(nsoc, p)

import numpy as np
# pick 40 students at random from the original network
# as your seeds
nsurvey = 40
# choose the nodes that you have survey data for
nodes_matched = np.sort(np.random.choice(nsoc, size=nsurvey, replace=False))
# copy over the corresponding subnetwork induced by nodes_matched
Asoc_ss = Asoc[nodes_matched,:][:,nodes_matched]

# remove 50% of the edges at random
# create a mask for upper triangle
utri_mask = np.zeros((nsurvey, nsurvey), dtype=bool)
utri_mask[np.triu_indices(nsurvey, k=1)] = True
# compute nnz edges
Asoc_ss[~utri_mask] = 0
nnz = Asoc_ss.sum()
# choose the 50% of edges to remain
nz_edges = np.nonzero(Asoc_ss)
retain_edges = np.random.choice(nz_edges[1].shape[0], size=int(np.floor(0.5*nnz)),
                                replace=False)

Asurvey = np.zeros((nsurvey, nsurvey))
Asurvey[nz_edges[0][retain_edges], nz_edges[1][retain_edges]] = 1
# symmetrize
Asurvey = Asurvey + Asurvey.T

nvois = 1
np.random.seed(1234)
# pick a voi randomly from the nodes which have a matching pair in the survey
# and exclude the seed nodes
soc_nodes_nonseeds = nodes_matched[~np.in1d(nodes_matched, seeds_soc)]
voi = np.random.choice(soc_nodes_nonseeds, size=nvois, replace=False)[0]

from graspologic.nominate import VNviaSGM

vn_sgm = VNviaSGM(graph_match_kws = {'padding': 'adopted'})

nominees = vn_sgm.fit_predict(Asoc, Asurvey, int(voi), [seeds_soc, seeds_survey])

I get an error:

ValueError                                Traceback (most recent call last)
<ipython-input-78-49460625578b> in <module>
      3 vn_sgm = VNviaSGM(graph_match_kws = {'padding': 'adopted'})
      4 
----> 5 nominees = vn_sgm.fit_predict(Asoc, Asurvey, int(voi), [seeds_soc, seeds_survey])

~/.virtualenvs/graph-book/lib/python3.8/site-packages/graspologic/nominate/VNviaSGM.py in fit_predict(self, A, B, voi, seeds)
    350             The nomination list.
    351         """
--> 352         self.fit(A, B, voi, seeds)
    353 
    354         return self.nomination_list_

~/.virtualenvs/graph-book/lib/python3.8/site-packages/graspologic/nominate/VNviaSGM.py in fit(self, A, B, voi, seeds)
    310         # include the seeds, so we must remove them from b_inds. Return a list
    311         # sorted so it returns the vertex with the highest probability first.
--> 312         nomination_list_ = np.dstack((b_inds[self.n_seeds_ :], prob_vector))[0]
    313         nomination_list_ = nomination_list_[nomination_list_[:, 1].argsort()][::-1]
    314 

<__array_function__ internals> in dstack(*args, **kwargs)

~/.virtualenvs/graph-book/lib/python3.8/site-packages/numpy/lib/shape_base.py in dstack(tup)
    721     if not isinstance(arrs, list):
    722         arrs = [arrs]
--> 723     return _nx.concatenate(arrs, 2)
    724 
    725 

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 23 and the array at index 1 has size 42

When I flip the networks around, this error goes away, so my guess is there is some bug with respect to how padded nodes are handled.

ebridge2 avatar Mar 23 '22 04:03 ebridge2