nxontology
nxontology copied to clipboard
Generating a numpy array of pairwise similarity scores
Here's some example code to generate pairwise lin similarity scores for all node pairs in an NXOntology. For now just posting in case it's helpful, but it's also possible we could create a function to populate a matrix with a similarity metric.
def generate_similarity_matrix(nxo: NXOntology[str]) -> npt.NDArray[np.float_]:
nxo.freeze()
nodes = list(nxo.graph.nodes)
# ensure nodes are sorted, since matrix does not store row/column names
assert sorted(nodes) == nodes
similarity_array = np.zeros(shape=(nxo.n_nodes, nxo.n_nodes), dtype=np.float32)
logging.info(
f"Initialized {similarity_array.shape} array:\n{similarity_array[:5, :5]}"
)
# lin is symmetric, so we use combinations_with_replacement rather than product
for (row, row_efo), (col, col_efo) in combinations_with_replacement(
list(enumerate(nodes)), r=2
):
similarity = nxo.similarity(row_efo, col_efo)
similarity_array[row, col] = similarity.lin
# only works for symmetric metrics
similarity_array[col, row] = similarity.lin
logging.info(f"Populated array with similarity:\n{similarity_array[:5, :5]}")
return similarity_array # type:ignore[return-value]
similarity_array = generate_similarity_matrix(nxo)
path = f"similarity-lin.npy.xz"
with fsspec.open(path, "wb", compression="infer") as write_file:
np.save(write_file, similarity_array)
On EFO, saving as an XZ compressed npy file worked well. Scipy.sparse matrices can also be considered but can be slower (or faster) to work with.