graspologic
graspologic copied to clipboard
[BUG] VNviaSGM doesn't seem to work with the paper example
Expected Behavior
The example in the paper involves finding nominees for a node in G1 amongst nodes in G2, where n1 > n2. When I construct an example that does approximately this:
from graspologic.simulations import er_np
nsoc = 50
p = 0.5
Asoc = er_np(nsoc, p)
import numpy as np
# pick 40 students at random from the original network
# as your seeds
nsurvey = 40
# choose the nodes that you have survey data for
nodes_matched = np.sort(np.random.choice(nsoc, size=nsurvey, replace=False))
# copy over the corresponding subnetwork induced by nodes_matched
Asoc_ss = Asoc[nodes_matched,:][:,nodes_matched]
# remove 50% of the edges at random
# create a mask for upper triangle
utri_mask = np.zeros((nsurvey, nsurvey), dtype=bool)
utri_mask[np.triu_indices(nsurvey, k=1)] = True
# compute nnz edges
Asoc_ss[~utri_mask] = 0
nnz = Asoc_ss.sum()
# choose the 50% of edges to remain
nz_edges = np.nonzero(Asoc_ss)
retain_edges = np.random.choice(nz_edges[1].shape[0], size=int(np.floor(0.5*nnz)),
replace=False)
Asurvey = np.zeros((nsurvey, nsurvey))
Asurvey[nz_edges[0][retain_edges], nz_edges[1][retain_edges]] = 1
# symmetrize
Asurvey = Asurvey + Asurvey.T
nvois = 1
np.random.seed(1234)
# pick a voi randomly from the nodes which have a matching pair in the survey
# and exclude the seed nodes
soc_nodes_nonseeds = nodes_matched[~np.in1d(nodes_matched, seeds_soc)]
voi = np.random.choice(soc_nodes_nonseeds, size=nvois, replace=False)[0]
from graspologic.nominate import VNviaSGM
vn_sgm = VNviaSGM(graph_match_kws = {'padding': 'adopted'})
nominees = vn_sgm.fit_predict(Asoc, Asurvey, int(voi), [seeds_soc, seeds_survey])
I get an error:
ValueError Traceback (most recent call last)
<ipython-input-78-49460625578b> in <module>
3 vn_sgm = VNviaSGM(graph_match_kws = {'padding': 'adopted'})
4
----> 5 nominees = vn_sgm.fit_predict(Asoc, Asurvey, int(voi), [seeds_soc, seeds_survey])
~/.virtualenvs/graph-book/lib/python3.8/site-packages/graspologic/nominate/VNviaSGM.py in fit_predict(self, A, B, voi, seeds)
350 The nomination list.
351 """
--> 352 self.fit(A, B, voi, seeds)
353
354 return self.nomination_list_
~/.virtualenvs/graph-book/lib/python3.8/site-packages/graspologic/nominate/VNviaSGM.py in fit(self, A, B, voi, seeds)
310 # include the seeds, so we must remove them from b_inds. Return a list
311 # sorted so it returns the vertex with the highest probability first.
--> 312 nomination_list_ = np.dstack((b_inds[self.n_seeds_ :], prob_vector))[0]
313 nomination_list_ = nomination_list_[nomination_list_[:, 1].argsort()][::-1]
314
<__array_function__ internals> in dstack(*args, **kwargs)
~/.virtualenvs/graph-book/lib/python3.8/site-packages/numpy/lib/shape_base.py in dstack(tup)
721 if not isinstance(arrs, list):
722 arrs = [arrs]
--> 723 return _nx.concatenate(arrs, 2)
724
725
<__array_function__ internals> in concatenate(*args, **kwargs)
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 23 and the array at index 1 has size 42
When I flip the networks around, this error goes away, so my guess is there is some bug with respect to how padded nodes are handled.