neat-ml icon indicating copy to clipboard operation
neat-ml copied to clipboard

In link prediction, filter nodes by prefix or other slots

Open caufieldjh opened this issue 2 years ago • 4 comments

Some graphs have nodes we would like to filter for, but they don't make clear distinctions in their Biolink categories:

PR:000002977    biolink:NamedThing                              Graph                                             owl:Class

So we would like to specify a filter for prefix rather than category. This can be based on a flag used in the link_node_types: block in the config.

Similarly, it would be nice to be able to filter by other node slots/properties:

XPO:0134172     biolink:NamedThing      increased apoptosis in simple columnar epithelium       An increased occurrence of apoptotic process in simple columnar epithelium.                Graph

This could be as simple as a regex for a string value in a named column, e.g., match everything with the string "apoptosis"

caufieldjh avatar Apr 27 '22 16:04 caufieldjh

@LucaCappelletti94 you may have already solved this problem in terms of filtering graph nodelists by CURIE prefix and mapping it to a namespace

caufieldjh avatar May 02 '22 16:05 caufieldjh

In ensmallen it is possible to filter by the prefix, but I do not know what you mean by mapping it to a namespace.

LucaCappelletti94 avatar Jun 01 '22 16:06 LucaCappelletti94

Same thing as far as we're concerned - namespace == prefix , at least as far as node IDs go.

caufieldjh avatar Jun 01 '22 16:06 caufieldjh

Ok, then graph.filter_from_names(...) has all of the kwargs you may desire for this sort of goal. It should be available in the latest nightly build if I am not mistaken (0.7.0.dev20).

LucaCappelletti94 avatar Jun 01 '22 16:06 LucaCappelletti94