RTX Utilize node/edge normalizer on external KP bioentities

From the Architecture call today and associated slides, we shouldn't trust that the KP's are using the "correct" identifiers. Tyler and Chris both suggest that we check the normalizers to make sure the identifiers are in alignment with the normalizers.

Dec 08 '20 19:12 dkoslicki

so as long as expand is called with use_synonyms=true (the default), then I think this is already effectively being done for nodes. expand runs any node IDs returned by KPs through the NodeSynonymizer and converts them to their "preferred" (canonical) curies before adding them to the KG. (and the NodeSynonymizer ingests the SRI Node Normalizer... and I think uses whatever "preferred" curie the SRI Node Normalizer suggests?)

but edges from external KPs are not normalized currently (expand currently just leaves predicates as they are). so I suppose perhaps that should be done!

Dec 08 '20 22:12 amykglen

Thank you for the update, @dkoslicki. This is useful information.

Dec 09 '20 17:12 saramsey

@amykglen does the NodeSynonymizer know about all curies, or just the ones in ARAX/KG2? I ask since I'm wondering about the case where a KP returns a curie we haven't seen before. I guess this boils down to: do we consult the SRI node normalizer on the fly, or only in bulk for the curies in KG2?

Dec 09 '20 20:12 dkoslicki

good question - I'm not totally sure of the answer. I don't think the SRI node normalizer is ever consulted on the fly currently, but I'm not sure what info is all ingested from it in bulk during the ARAX node synonymizer build process. @edeutsch? does the synonymizer grab all info from the SRI node normalizer? or only info for nodes in KG2?

Dec 09 '20 21:12 amykglen

During the NodeSynonymizer database build process, it downloads all synonyms for all concepts in KG2 from the SRI Node Normalizer API. So if DOID:123 is in KG2, then all synonyms that the SRI normalizer has for the concept is in the list of NodeSynonymizer synonyms list, irrespective of whether those nodes are in KG2.

However, if disease X is absent completely from KG2, then it will not be in NodeSynonymizers database and be unknown.

It may well be a lot more complete to begin with the SRI Node Normalizers source database rather than querying its endpoint for all of our concepts, but I did not see a way to get their complete database. And the basic assumption was that if we don't have the concept (by any CURIE) in KG2, we wouldn't be able to do anything useful with it anyway. But as we expand to other KPs, maybe that assumption isn't so good.

Dec 09 '20 21:12 edeutsch

ok, interesting. hmm... if we're not able to get all of the SRI node normalizer's data in bulk to incorporate into the NodeSynonymizer's build process, I suppose expand could hit up the SRI node normalizer on the fly only for those curies that the NodeSynonymizer doesn't recognize.

(I think usually the vast majority of curies other KPs return are recognized by the NodeSynonymizer, so I don't think too many curies would need to be sent to the SRI node normalizer.)

Dec 09 '20 21:12 amykglen

Although it may not be fully solved, I think this issue is sufficiently stale that it doesn't need to be on our radar anymore. Please reopen if there is active interest in exploring this.

Apr 21 '25 17:04 edeutsch