RTX icon indicating copy to clipboard operation
RTX copied to clipboard

idea for fancier Expand via NGD

Open edeutsch opened this issue 4 years ago • 47 comments

Breaking out from #1345

I probably should know about this, but I confess I don't: are we now able to Expand based purely on NGD? Is there a database of pre-computed top relationships? I recall we mused about this like a year ago, but I don't recall how it ended up. I would think all the predicates in NGD expansion would be just something bland like "associated_with" if not our made-up "biolink:has_normalized_google_distance_with"

ah, that's true, I forgot that the only accepted predicate should be some simple thing like you suggest. but the accepted node categories will be the same as for KG2.

yeah, we can expand using ngd (in some capacity) - the issue for that was #975. it doesn't use a database of pre-computed relationships at the moment; it's currently limited to connections in KG2c. so it first looks for neighbors of the input node in KG2c, and then computes ngd between that node and all its neighbors, and drops those with an NGD worse than a particular value.

I'm thinking a fun project for someone in the future:

  • Precompute pairwise best ~50 NGD values between each node and all other categories
  • For the 6 million nodes in KG2C this is too much
  • But I think we can easily pick ~100,000 most important nodes and first just try to compute amongst those. That should be feasible I think?
  • The most important ~100,000 nodes? 27k UniProtKBs, 13k RXNORMs, 24k MONDOs, 18k KEGGs, etc. just focusing on these minimally redundant nodes (i.e. the clusters they belong to) would already be interesting
  • I'm thinking that a couple days of processing could build that easy using the current FastNGD database?
  • The trick might be reducing the problem (number of nodes considered) to something we can afford
  • Then we have a nice table where for any of the nodes that map to the above entries, when can answer which [category]s are most closely co-occuring in PubMed with CURIE X?
  • Perhaps Expand would only invoke this "KP" if the query category was vague and a traditional query yielded very little or nothing.

?

edeutsch avatar Apr 06 '21 04:04 edeutsch

@chunyuma is this anything you would be interested in? It’s similar in spirit to your DTD database/expander

dkoslicki avatar Apr 06 '21 18:04 dkoslicki

Thanks @dkoslicki and @edeutsch, I'm quite interested in this and I think it is also helpful for explainable DTD model to reduce the memory load when the size of KG2c is reduced

chunyuma avatar Apr 06 '21 18:04 chunyuma

One problem that I concerned is: would this affect other databases (e.g. DTD) because we only consider the most important nodes? Some drugs or some diseases might be rare and might not be in the most important node list.

chunyuma avatar Apr 06 '21 18:04 chunyuma

I don’t think this would affect DTD: the idea of “important nodes” is, I think, just to reduce the total number of pairs that need to be computed. @edeutsch can clarify if he was thinking otherwise, but basically we would not be removing nodes from KG2/KG2C, but rather computing NGD on a subset of KG2/KG2C and storing them in a database, and intelligently use it as an expander

dkoslicki avatar Apr 06 '21 18:04 dkoslicki

Ah, I see! Thanks @dkoslicki

chunyuma avatar Apr 06 '21 19:04 chunyuma

Yes, that's correct. I think in our 6 million nodes the vast majority will never appear in a query and I think we can ignore for a first attempt at this. For example, do a search for ibuprofen and RXNORM in our KG and you get:

RXNORM:368840	Ibuprofen Oral Tablet [Genpril]		biolink:Drug
RXNORM:368823	Ibuprofen Oral Tablet [Ibu]		biolink:Drug
RXNORM:637192	Ibuprofen 10 MG/ML		biolink:Drug
RXNORM:637195	Ibuprofen 10 MG/ML [Neoprofen]		biolink:Drug
RXNORM:1722333	Ibuprofen 200 MG / Phenylephrine Hydrochloride 10 MG Oral Tablet [Advil Sinus Congestion and Pain]		biolink:Drug
RXNORM:1722329	Ibuprofen 200 MG / Phenylephrine Hydrochloride 10 MG [Advil Sinus Congestion and Pain]		biolink:Drug
RXNORM:1722330	Ibuprofen / Phenylephrine Oral Tablet [Advil Sinus Congestion and Pain]		biolink:Drug
RXNORM:373693	Ibuprofen / Pseudoephedrine Oral Capsule		biolink:Drug
RXNORM:314047	Ibuprofen 50 MG Chewable Tablet		biolink:Drug
RXNORM:577191	Chlorpheniramine / Ibuprofen / Pseudoephedrine Oral Suspension		biolink:Drug
RXNORM:1300267	Ibuprofen 200 MG Oral Tablet [Proprinal]		biolink:Drug
RXNORM:142102	Ibuprofen 50 MG/ML Topical Spray		biolink:Drug
RXNORM:372455	Codeine / Ibuprofen Oral Tablet		biolink:Drug
RXNORM:372456	Codeine / Ibuprofen Extended Release Oral Tablet		biolink:Drug
RXNORM:372449	Ibuprofen Extended Release Oral Tablet		biolink:Drug
RXNORM:201126	Ibuprofen 200 MG Oral Tablet [Motrin]		biolink:Drug
RXNORM:36761	ibuprofen lysine		biolink:Drug
RXNORM:484259	Ibuprofen / Oxycodone		biolink:Drug
RXNORM:1300263	Ibuprofen 200 MG [Proprinal]		biolink:Drug
RXNORM:1300264	Ibuprofen Oral Tablet [Proprinal]		biolink:Drug
RXNORM:392668	Ibuprofen 0.05 MG/MG / LEVOMENTHOL 0.03 MG/MG Topical Gel		biolink:Drug
RXNORM:392617	Ibuprofen / Menthol		biolink:Drug
RXNORM:1297369	Chlorpheniramine Maleate 0.2 MG/ML / Ibuprofen 20 MG/ML / Pseudoephedrine Hydrochloride 3 MG/ML Oral Suspension		biolink:Drug
RXNORM:380819	Ibuprofen Topical Foam		biolink:Drug
RXNORM:1297390	Chlorpheniramine Maleate 2 MG / Ibuprofen 200 MG / Pseudoephedrine Hydrochloride 30 MG Oral Tablet		biolink:Drug
RXNORM:367939	Ibuprofen / Pseudoephedrine Oral Tablet [Advil Cold and Sinus]		biolink:Drug
RXNORM:333683	Ibuprofen 40 MG/ML		biolink:Drug
RXNORM:1295502	Ibuprofen Chewable Product		biolink:Drug
RXNORM:1090449	Ibuprofen / Pseudoephedrine Oral Tablet [Wal-Profen Cold and Sinus]		biolink:Drug
RXNORM:1158493	Famotidine / Ibuprofen Oral Product		biolink:Drug
RXNORM:5640	Ibuprofen		biolink:Drug
RXNORM:202098	Ibuprofen 800 MG Oral Tablet [Motrin]		biolink:Drug
RXNORM:643059	Diphenhydramine / Ibuprofen Oral Tablet		biolink:Drug
RXNORM:637197	2 ML Ibuprofen 10 MG/ML Injection [Neoprofen]		biolink:Drug
RXNORM:544393	Ibuprofen 20 MG/ML Oral Suspension [Motrin]		biolink:Drug
RXNORM:544391	Ibuprofen 20 MG/ML [Motrin]		biolink:Drug
RXNORM:544392	Ibuprofen Oral Suspension [Motrin]		biolink:Drug
RXNORM:1007410	Carisoprodol / Ibuprofen		biolink:Drug
RXNORM:1007329	Ibuprofen / Phenylephrine		biolink:Drug
RXNORM:1007373	Ibuprofen / Vitamin B 12		biolink:Drug
RXNORM:1007917	Hydroxocobalamin / Ibuprofen		biolink:Drug
RXNORM:1007482	Ibuprofen / Lidocaine		biolink:Drug
RXNORM:1369775	Ibuprofen 200 MG / Phenylephrine Hydrochloride 10 MG Oral Tablet		biolink:Drug
RXNORM:1007823	cyclonium / Ibuprofen		biolink:Drug
RXNORM:2045474	Ibuprofen Oral Tablet [Dragon Tabs]		biolink:Drug
RXNORM:2045473	Ibuprofen 200 MG [Dragon Tabs]		biolink:Drug
RXNORM:2045477	Ibuprofen 200 MG Oral Tablet [Dragon Tabs]		biolink:Drug
RXNORM:567707	Ibuprofen 400 MG [Ibu]		biolink:Drug
RXNORM:567715	Ibuprofen 600 MG [Ibu]		biolink:Drug
RXNORM:567719	Ibuprofen 800 MG [Ibu]		biolink:Drug
RXNORM:1299021	Ibuprofen 200 MG / Pseudoephedrine Hydrochloride 30 MG Oral Tablet		biolink:Drug
RXNORM:1299022	Ibuprofen 200 MG / Pseudoephedrine Hydrochloride 30 MG Oral Tablet [Advil Cold and Sinus]		biolink:Drug
RXNORM:1299020	Ibuprofen 200 MG / Pseudoephedrine Hydrochloride 30 MG Oral Capsule [Advil Cold and Sinus]		biolink:Drug
RXNORM:1299018	Ibuprofen 200 MG / Pseudoephedrine Hydrochloride 30 MG Oral Capsule		biolink:Drug
RXNORM:1299019	Ibuprofen 200 MG / Pseudoephedrine Hydrochloride 30 MG [Advil Cold and Sinus]		biolink:Drug
RXNORM:643063	Diphenhydramine / Ibuprofen Oral Tablet [Advil PM]		biolink:Drug
RXNORM:814985	Ibuprofen / Tolperisone		biolink:Drug
RXNORM:643100	Ibuprofen 200 MG [Wal-Profen]		biolink:Drug
RXNORM:643101	Ibuprofen Oral Tablet [Wal-Profen]		biolink:Drug
RXNORM:643102	Ibuprofen 200 MG Oral Tablet [Wal-Profen]		biolink:Drug
RXNORM:393432	Ibuprofen 0.1 MG/MG		biolink:Drug
RXNORM:393550	Ibuprofen / LEVOMENTHOL Topical Gel		biolink:Drug
RXNORM:368308	Hydrocodone / Ibuprofen Oral Tablet [Vicoprofen]		biolink:Drug
RXNORM:1159018	Famotidine / Ibuprofen Pill		biolink:Drug
RXNORM:565689	Ibuprofen 200 MG [Motrin]		biolink:Drug
RXNORM:854761	Ibuprofen 40 MG/ML [Motrin]		biolink:Drug
RXNORM:854762	Ibuprofen 40 MG/ML Oral Suspension [Motrin]		biolink:Drug
RXNORM:795911	Ibuprofen / Pseudoephedrine Oral Capsule [Advil Cold and Sinus]		biolink:Drug
RXNORM:335000	Ibuprofen 50 MG/ML		biolink:Drug
RXNORM:645634	Diphenhydramine / Ibuprofen Oral Capsule		biolink:Drug
RXNORM:1429044	Ibuprofen, Sodium Salt		biolink:Drug
RXNORM:565143	Ibuprofen 200 MG [Advil]		biolink:Drug
RXNORM:854183	8 ML Ibuprofen 100 MG/ML Injection		biolink:Drug
RXNORM:854182	Ibuprofen 100 MG/ML		biolink:Drug
RXNORM:854185	Ibuprofen 100 MG/ML [Caldolor]		biolink:Drug
RXNORM:854187	8 ML Ibuprofen 100 MG/ML Injection [Caldolor]		biolink:Drug
RXNORM:197803	Ibuprofen 20 MG/ML Oral Suspension		biolink:Drug
RXNORM:197806	Ibuprofen 600 MG Oral Tablet		biolink:Drug
RXNORM:197805	Ibuprofen 400 MG Oral Tablet		biolink:Drug
RXNORM:197807	Ibuprofen 800 MG Oral Tablet		biolink:Drug
RXNORM:993798	Ibuprofen / Phenylephrine Oral Tablet		biolink:Drug
RXNORM:566095	Ibuprofen 800 MG [Motrin]		biolink:Drug
RXNORM:1008079	homatropine / Ibuprofen		biolink:Drug
RXNORM:820465	Carisoprodol / Dexamethasone / Ibuprofen		biolink:Drug
RXNORM:1008170	Ibuprofen / Niacin		biolink:Drug
RXNORM:380845	Ibuprofen 0.05 MG/MG		biolink:Drug
RXNORM:198405	Ibuprofen 100 MG Oral Tablet		biolink:Drug
RXNORM:380813	Ibuprofen 300 MG Extended Release Oral Capsule		biolink:Drug
RXNORM:380812	Ibuprofen Extended Release Oral Capsule		biolink:Drug
RXNORM:821036	Chlorzoxazone / Ibuprofen		biolink:Drug
RXNORM:1008502	Ibuprofen / pseudoisocytidine		biolink:Drug
RXNORM:1165305	Ibuprofen / Oxycodone Oral Product		biolink:Drug
RXNORM:1165307	Ibuprofen / Phenylephrine Oral Product		biolink:Drug
RXNORM:1165306	Ibuprofen / Oxycodone Pill		biolink:Drug
RXNORM:1165309	Ibuprofen / Pseudoephedrine Oral Liquid Product		biolink:Drug
RXNORM:1165308	Ibuprofen / Phenylephrine Pill		biolink:Drug
RXNORM:1165310	Ibuprofen / Pseudoephedrine Oral Product		biolink:Drug
RXNORM:1165311	Ibuprofen / Pseudoephedrine Pill		biolink:Drug
RXNORM:316074	Ibuprofen 200 MG		biolink:Drug
RXNORM:316073	Ibuprofen 20 MG/ML		biolink:Drug
RXNORM:316076	Ibuprofen 50 MG		biolink:Drug
RXNORM:316075	Ibuprofen 300 MG		biolink:Drug
RXNORM:316078	Ibuprofen 800 MG		biolink:Drug
RXNORM:316077	Ibuprofen 600 MG		biolink:Drug
RXNORM:316072	Ibuprofen 100 MG		biolink:Drug
RXNORM:1008440	Ibuprofen / Scopolamine		biolink:Drug
RXNORM:1165299	Ibuprofen / LEVOMENTHOL Topical Product		biolink:Drug
RXNORM:1940584	Ibuprofen / Phenylephrine Oral Tablet [Wal-Profen Congestion Relief and Pain]		biolink:Drug
RXNORM:1940583	Ibuprofen 200 MG / Phenylephrine Hydrochloride 10 MG [Wal-Profen Congestion Relief and Pain]		biolink:Drug
RXNORM:1940587	Ibuprofen 200 MG / Phenylephrine Hydrochloride 10 MG Oral Tablet [Wal-Profen Congestion Relief and Pain]		biolink:Drug
RXNORM:389244	Ibuprofen 0.1 MG/MG Topical Gel		biolink:Drug
RXNORM:644895	Diphenhydramine / Ibuprofen		biolink:Drug
RXNORM:900434	Ibuprofen 200 MG Oral Tablet [Addaprin]		biolink:Drug
RXNORM:900433	Ibuprofen Oral Tablet [Addaprin]		biolink:Drug
RXNORM:900432	Ibuprofen 200 MG [Addaprin]		biolink:Drug
RXNORM:644386	Ibuprofen 200 MG Oral Capsule [Wal-Profen]		biolink:Drug
RXNORM:644385	Ibuprofen Oral Capsule [Wal-Profen]		biolink:Drug
RXNORM:93574	Ibuprofen Oral Tablet [Nuprin]		biolink:Drug
RXNORM:1009128	Caffeine / Ergotamine / Ibuprofen		biolink:Drug
RXNORM:1009037	Ibuprofen / Methocarbamol		biolink:Drug
RXNORM:204442	Ibuprofen 40 MG/ML Oral Suspension		biolink:Drug
RXNORM:1152222	Diphenhydramine / Ibuprofen Oral Product		biolink:Drug
RXNORM:1152223	Diphenhydramine / Ibuprofen Pill		biolink:Drug
RXNORM:606989	Ibuprofen Oral Capsule [Motrin]		biolink:Drug
RXNORM:606990	Ibuprofen 200 MG Oral Capsule [Motrin]		biolink:Drug
RXNORM:317388	Ibuprofen 400 MG		biolink:Drug
RXNORM:724134	Hydrocodone / Ibuprofen Oral Tablet [Reprexain]		biolink:Drug
RXNORM:206917	Ibuprofen 800 MG Oral Tablet [Ibu]		biolink:Drug
RXNORM:206913	Ibuprofen 600 MG Oral Tablet [Ibu]		biolink:Drug
RXNORM:206905	Ibuprofen 400 MG Oral Tablet [Ibu]		biolink:Drug
RXNORM:2178275	200 ML Ibuprofen 4 MG/ML Injection [Caldolor]		biolink:Drug
RXNORM:2178273	200 ML Ibuprofen 4 MG/ML Injection		biolink:Drug
RXNORM:2178274	Ibuprofen 4 MG/ML [Caldolor]		biolink:Drug
RXNORM:2178272	Ibuprofen 4 MG/ML		biolink:Drug
RXNORM:377956	Ibuprofen Topical Gel		biolink:Drug
RXNORM:758973	Hydrocodone / Ibuprofen Oral Tablet [Ibudone]		biolink:Drug
RXNORM:901814	Diphenhydramine Hydrochloride 25 MG / Ibuprofen 200 MG Oral Capsule		biolink:Drug
RXNORM:901817	Diphenhydramine Citrate 38 MG / Ibuprofen 200 MG [Advil PM]		biolink:Drug
RXNORM:901818	Diphenhydramine Citrate 38 MG / Ibuprofen 200 MG Oral Tablet [Advil PM]		biolink:Drug
RXNORM:817356	Acetaminophen / Codeine / Ibuprofen		biolink:Drug
RXNORM:1049589	Ibuprofen 400 MG / Oxycodone Hydrochloride 5 MG Oral Tablet		biolink:Drug
RXNORM:1299088	Ibuprofen 200 MG / Pseudoephedrine Hydrochloride 30 MG [Wal-Profen Cold and Sinus]		biolink:Drug
RXNORM:1299089	Ibuprofen 200 MG / Pseudoephedrine Hydrochloride 30 MG Oral Tablet [Wal-Profen Cold and Sinus]		biolink:Drug
RXNORM:567695	Ibuprofen 200 MG [Nuprin]		biolink:Drug
RXNORM:567680	Ibuprofen 20 MG/ML [Advil]		biolink:Drug
RXNORM:567688	Ibuprofen 200 MG [Genpril]		biolink:Drug
RXNORM:710303	Codeine / Ibuprofen		biolink:Drug
RXNORM:401976	Ibuprofen 300 MG / Pseudoephedrine 45 MG Oral Capsule		biolink:Drug
RXNORM:1310487	Ibuprofen 20 MG/ML / Pseudoephedrine Hydrochloride 3 MG/ML Oral Suspension		biolink:Drug
RXNORM:1310499	Chlorpheniramine / Ibuprofen / Phenylephrine Oral Product		biolink:Drug
RXNORM:895658	Diphenhydramine / Ibuprofen Oral Tablet [Motrin PM]		biolink:Drug
RXNORM:895666	Diphenhydramine Citrate 38 MG / Ibuprofen 200 MG Oral Tablet [Motrin PM]		biolink:Drug
RXNORM:895664	Diphenhydramine Citrate 38 MG / Ibuprofen 200 MG Oral Tablet		biolink:Drug
RXNORM:895665	Diphenhydramine Citrate 38 MG / Ibuprofen 200 MG [Motrin PM]		biolink:Drug
RXNORM:1310502	Chlorpheniramine / Ibuprofen / Phenylephrine		biolink:Drug
RXNORM:1310503	Chlorpheniramine Maleate 4 MG / Ibuprofen 200 MG / Phenylephrine Hydrochloride 10 MG Oral Tablet		biolink:Drug
RXNORM:1310500	Chlorpheniramine / Ibuprofen / Phenylephrine Pill		biolink:Drug
RXNORM:1310501	Chlorpheniramine / Ibuprofen / Phenylephrine Oral Tablet		biolink:Drug
RXNORM:377325	Ibuprofen Topical Spray		biolink:Drug
RXNORM:250418	Ibuprofen 800 MG Extended Release Oral Tablet		biolink:Drug
RXNORM:1100064	Famotidine / Ibuprofen Oral Tablet		biolink:Drug
RXNORM:1100065	Famotidine / Ibuprofen		biolink:Drug
RXNORM:1100068	Famotidine 26.6 MG / Ibuprofen 800 MG [Duexis]		biolink:Drug
RXNORM:1100069	Famotidine / Ibuprofen Oral Tablet [Duexis]		biolink:Drug
RXNORM:1100066	Famotidine 26.6 MG / Ibuprofen 800 MG Oral Tablet		biolink:Drug
RXNORM:1100070	Famotidine 26.6 MG / Ibuprofen 800 MG Oral Tablet [Duexis]		biolink:Drug
RXNORM:483322	Ibuprofen / Oxycodone Oral Tablet		biolink:Drug
RXNORM:226617	Ibuprofen 50 MG/ML Topical Foam		biolink:Drug
RXNORM:214652	Ibuprofen / Pseudoephedrine		biolink:Drug
RXNORM:792241	Ibuprofen Chewable Tablet [Motrin]		biolink:Drug
RXNORM:792240	Ibuprofen 100 MG [Motrin]		biolink:Drug
RXNORM:792242	Ibuprofen 100 MG Chewable Tablet [Motrin]		biolink:Drug
RXNORM:214627	Hydrocodone / Ibuprofen		biolink:Drug
RXNORM:902632	Diphenhydramine / Ibuprofen Oral Capsule [Advil PM Liqui Gels]		biolink:Drug
RXNORM:902633	Diphenhydramine Hydrochloride 25 MG / Ibuprofen 200 MG Oral Capsule [Advil PM Liqui Gels]		biolink:Drug
RXNORM:902631	Diphenhydramine Hydrochloride 25 MG / Ibuprofen 200 MG [Advil PM Liqui Gels]		biolink:Drug
RXNORM:153008	Ibuprofen 200 MG Oral Tablet [Advil]		biolink:Drug
RXNORM:377732	Ibuprofen Topical Cream		biolink:Drug
RXNORM:370674	Ibuprofen Oral Tablet		biolink:Drug
RXNORM:370673	Ibuprofen Chewable Tablet		biolink:Drug
RXNORM:370672	Ibuprofen Oral Suspension		biolink:Drug
RXNORM:370678	Ibuprofen / Pseudoephedrine Oral Tablet		biolink:Drug
RXNORM:370677	Ibuprofen / Pseudoephedrine Oral Suspension		biolink:Drug
RXNORM:370676	Hydrocodone / Ibuprofen Oral Tablet		biolink:Drug
RXNORM:370675	Ibuprofen Oral Capsule		biolink:Drug
RXNORM:1359097	Ibuprofen 200 MG Oral Tablet [Ibutab]		biolink:Drug
RXNORM:1359093	Ibuprofen 200 MG [Ibutab]		biolink:Drug
RXNORM:1359094	Ibuprofen Oral Tablet [Ibutab]		biolink:Drug
RXNORM:818102	Acetaminophen / Ibuprofen		biolink:Drug
RXNORM:206878	Ibuprofen 20 MG/ML Oral Suspension [Advil]		biolink:Drug
RXNORM:206886	Ibuprofen 200 MG Oral Tablet [Genpril]		biolink:Drug
RXNORM:206893	Ibuprofen 200 MG Oral Tablet [Nuprin]		biolink:Drug
RXNORM:404789	Chlorpheniramine / Ibuprofen / Pseudoephedrine		biolink:Drug
RXNORM:1154775	Chlorpheniramine / Ibuprofen / Pseudoephedrine Oral Liquid Product		biolink:Drug
RXNORM:1154776	Chlorpheniramine / Ibuprofen / Pseudoephedrine Oral Product		biolink:Drug
RXNORM:1154777	Chlorpheniramine / Ibuprofen / Pseudoephedrine Pill		biolink:Drug
RXNORM:1154818	Codeine / Ibuprofen Oral Product		biolink:Drug
RXNORM:1154819	Codeine / Ibuprofen Pill		biolink:Drug
RXNORM:1791362	Ibuprofen Injection [Caldolor]		biolink:Drug
RXNORM:1791366	Ibuprofen Injection [Neoprofen]		biolink:Drug
RXNORM:859331	Hydrocodone Bitartrate 10 MG / Ibuprofen 200 MG Oral Tablet [Reprexain]		biolink:Drug
RXNORM:859330	Hydrocodone Bitartrate 10 MG / Ibuprofen 200 MG [Reprexain]		biolink:Drug
RXNORM:859315	Hydrocodone Bitartrate 10 MG / Ibuprofen 200 MG Oral Tablet		biolink:Drug
RXNORM:859317	Hydrocodone Bitartrate 10 MG / Ibuprofen 200 MG Oral Tablet [Ibudone]		biolink:Drug
RXNORM:859316	Hydrocodone Bitartrate 10 MG / Ibuprofen 200 MG [Ibudone]		biolink:Drug
RXNORM:310965	Ibuprofen 200 MG Oral Tablet		biolink:Drug
RXNORM:310963	Ibuprofen 100 MG Chewable Tablet		biolink:Drug
RXNORM:310964	Ibuprofen 200 MG Oral Capsule		biolink:Drug
RXNORM:1101917	Ibuprofen 200 MG [Counteract IB]		biolink:Drug
RXNORM:1101918	Ibuprofen Oral Tablet [Counteract IB]		biolink:Drug
RXNORM:1101919	Ibuprofen 200 MG Oral Tablet [Counteract IB]		biolink:Drug
RXNORM:731528	Ibuprofen Chewable Tablet [Advil]		biolink:Drug
RXNORM:731529	Ibuprofen 50 MG Chewable Tablet [Advil]		biolink:Drug
RXNORM:731527	Ibuprofen 50 MG [Advil]		biolink:Drug
RXNORM:731535	Ibuprofen 100 MG Oral Tablet [Advil]		biolink:Drug
RXNORM:731536	Ibuprofen 100 MG Chewable Tablet [Advil]		biolink:Drug
RXNORM:731533	Ibuprofen 200 MG Oral Capsule [Advil]		biolink:Drug
RXNORM:731534	Ibuprofen 100 MG [Advil]		biolink:Drug
RXNORM:731531	Ibuprofen 40 MG/ML Oral Suspension [Advil]		biolink:Drug
RXNORM:731532	Ibuprofen Oral Capsule [Advil]		biolink:Drug
RXNORM:731530	Ibuprofen 40 MG/ML [Advil]		biolink:Drug
RXNORM:227159	Ibuprofen 200 MG Extended Release Oral Capsule		biolink:Drug
RXNORM:858798	Hydrocodone Bitartrate 7.5 MG / Ibuprofen 200 MG Oral Tablet		biolink:Drug
RXNORM:858783	Hydrocodone Bitartrate 5 MG / Ibuprofen 200 MG [Reprexain]		biolink:Drug
RXNORM:858780	Hydrocodone Bitartrate 5 MG / Ibuprofen 200 MG Oral Tablet [Ibudone]		biolink:Drug
RXNORM:858784	Hydrocodone Bitartrate 5 MG / Ibuprofen 200 MG Oral Tablet [Reprexain]		biolink:Drug
RXNORM:858772	Hydrocodone Bitartrate 2.5 MG / Ibuprofen 200 MG Oral Tablet [Reprexain]		biolink:Drug
RXNORM:858771	Hydrocodone Bitartrate 2.5 MG / Ibuprofen 200 MG [Reprexain]		biolink:Drug
RXNORM:858770	Hydrocodone Bitartrate 2.5 MG / Ibuprofen 200 MG Oral Tablet		biolink:Drug
RXNORM:858779	Hydrocodone Bitartrate 5 MG / Ibuprofen 200 MG [Ibudone]		biolink:Drug
RXNORM:858778	Hydrocodone Bitartrate 5 MG / Ibuprofen 200 MG Oral Tablet		biolink:Drug
RXNORM:1292323	Diphenhydramine Citrate 38 MG / Ibuprofen 200 MG Oral Capsule		biolink:Drug
RXNORM:541713	Ibuprofen 800 MG Oral Tablet [Samson 8]		biolink:Drug
RXNORM:541712	Ibuprofen Oral Tablet [Samson 8]		biolink:Drug
RXNORM:541711	Ibuprofen 800 MG [Samson 8]		biolink:Drug
RXNORM:93358	Ibuprofen Oral Tablet [Motrin]		biolink:Drug
RXNORM:1542984	Hydrocodone Bitartrate 10 MG / Ibuprofen 200 MG [Xylon]		biolink:Drug
RXNORM:1542988	Hydrocodone Bitartrate 10 MG / Ibuprofen 200 MG Oral Tablet [Xylon]		biolink:Drug
RXNORM:1542985	Hydrocodone / Ibuprofen Oral Tablet [Xylon]		biolink:Drug
RXNORM:1747293	Ibuprofen Injection		biolink:Drug
RXNORM:1747294	2 ML Ibuprofen 10 MG/ML Injection		biolink:Drug
RXNORM:687386	Ibuprofen / LEVOMENTHOL		biolink:Drug
RXNORM:858838	Hydrocodone Bitartrate 7.5 MG / Ibuprofen 200 MG Oral Tablet [Vicoprofen]		biolink:Drug
RXNORM:858837	Hydrocodone Bitartrate 7.5 MG / Ibuprofen 200 MG [Vicoprofen]		biolink:Drug
RXNORM:379847	Ibuprofen 3 MG/ML		biolink:Drug
RXNORM:850424	Ibuprofen 200 MG Oral Tablet [Ibuprohm]		biolink:Drug
RXNORM:850423	Ibuprofen Oral Tablet [Ibuprohm]		biolink:Drug
RXNORM:850422	Ibuprofen 200 MG [Ibuprohm]		biolink:Drug
RXNORM:2184152	Ibuprofen 200 MG / Phenylephrine Hydrochloride 5 MG Oral Tablet		biolink:Drug
RXNORM:997280	Codeine Phosphate 20 MG / Ibuprofen 300 MG Extended Release Oral Tablet		biolink:Drug
RXNORM:1156280	Ibuprofen Topical Product		biolink:Drug
RXNORM:1156275	Ibuprofen Injectable Product		biolink:Drug
RXNORM:1156278	Ibuprofen Pill		biolink:Drug
RXNORM:1156277	Ibuprofen Oral Product		biolink:Drug
RXNORM:1156276	Ibuprofen Oral Liquid Product		biolink:Drug
RXNORM:997165	Codeine Phosphate 12.8 MG / Ibuprofen 200 MG Oral Tablet		biolink:Drug
RXNORM:997164	Codeine Phosphate 12.5 MG / Ibuprofen 200 MG Oral Tablet		biolink:Drug
RXNORM:365861	Ibuprofen Oral Suspension [Advil]		biolink:Drug
RXNORM:806013	Ibuprofen 100 MG Oral Tablet [Motrin]		biolink:Drug
RXNORM:1597118	Chondroitin Sulfates / Glucosamine / Ibuprofen		biolink:Drug
RXNORM:91703	Ibuprofen Oral Tablet [Advil]		biolink:Drug
RXNORM:141998	Ibuprofen 50 MG/ML Topical Cream		biolink:Drug
RXNORM:141997	Ibuprofen 0.05 MG/MG Topical Gel		biolink:Drug
RXNORM:141993	Ibuprofen 3 MG/ML Oral Suspension		biolink:Drug
RXNORM:851211	60 (caffeine 65 MG / riboflavin 6.25 MG / thiamine 25 MG / vitamin B 12 0.125 MG / vitamin B6 25 MG Oral Capsule) / 60 (ibuprofen 800 MG Oral Tablet) Pack	biolink:Drug
RXNORM:1162789	Hydrocodone / Ibuprofen Pill		biolink:Drug
RXNORM:1162788	Hydrocodone / Ibuprofen Oral Product		biolink:Drug
RXNORM:405928	Chlorpheniramine / Ibuprofen / Pseudoephedrine Oral Tablet		biolink:Drug

I wonder if we can simply this list further so we only compute on a handful of these rather than the huge list.

edeutsch avatar Apr 06 '21 19:04 edeutsch

I'm not sure if this method can remove some of generic concepts in KG2c, but just points out this problem here. I think some of nodes in KG2c (Please see the list below) have generic semantic meaning which might also never appear in a query (eg. MONDO:0004992 which is cancer and SO:0001217 which is protein_coding_gene). These nodes normally have extremely high in degree.

Please ignore the accuracy of category column below because the table is summarized from my local version of KG2c which excluded some node types (e.g. biolink:NamedThing, biolink:MolecularEnitty) and caused NodeSynonymizer to assign some wrong categories.

curie_id name category indegree outdegree
SO:0001217 protein_coding_gene biolink:Gene 97419 0
LOINC:LP208893-0 Pt biolink:Procedure 83179 1
CHEMBL.COMPOUND:CHEMBL87852 Hexadecanoic acid (S)-2-hexadecanoyloxy-1-hydr... biolink:ChemicalSubstance 59922 20571
UMLS:C0025255 Membrane biolink:GrossAnatomicalStructure 59623 2243
CHEMBL.COMPOUND:CHEMBL307679 Phosphoric acid mono-[5-(4-amino-2-oxo-2H-pyri... biolink:ChemicalSubstance 57431 35842
CHEMBL.COMPOUND:CHEMBL1623949   biolink:ChemicalSubstance 54698 51477
CHEMBL.COMPOUND:CHEMBL2286758 1-palmitoyl-2-(3-trans)-hexadecenoyl-sn-glycer... biolink:ChemicalSubstance 50788 31647
KEGG:C00269 CDP-diacylglycerol biolink:Metabolite 42988 30127
LOINC:LP7753-9 Qn biolink:Procedure 41873 0
CHEMBL.COMPOUND:CHEMBL3343985 Trilinolein biolink:ChemicalSubstance 39900 11874
DRUGBANK:DB03429 Tetrastearoyl cardiolipin biolink:ChemicalSubstance 38460 20150
MONDO:0000001 disease or disorder biolink:Disease 26125 9246
UMLS:C0007634 Cell biolink:Cell 25666 8887
LOINC:LP7751-3 Ord biolink:Procedure 24643 0
LOINC:LP7567-3 Ser biolink:Procedure 21673 0
MONDO:0004992 cancer biolink:Disease 21311 10623
CHEBI:15378 hydron biolink:ChemicalSubstance 21017 54538
CHEBI:36080 protein biolink:Protein 20927 1032
CHEMBL.COMPOUND:CHEMBL1098659 WATER biolink:ChemicalSubstance 19740 60653
PR:000029067 Homo sapiens protein biolink:Protein 19108 1
PR:000029032 Mus musculus protein biolink:Protein 17115 1
LOINC:LA4634-7 Patient biolink:Procedure 16877 0
UMLS:C0040300 Portion of tissue biolink:GrossAnatomicalStructure 16106 2477
PR:000029045 Arabidopsis thaliana protein biolink:Protein 15834 1
CHEMBL.COMPOUND:CHEMBL1488784 SID11113658 biolink:ChemicalSubstance 15825 16530
OMIM:MTHU000046 Growth biolink:PhenotypicFeature 15342 2644
CHEMBL.COMPOUND:CHEMBL3321993 TF biolink:ChemicalSubstance 14334 12663
LOINC:LP20667-9 Ab biolink:Procedure 14307 0
UMLS:C0006104 Brain biolink:GrossAnatomicalStructure 13475 686
VANDF:4017451 Liver biolink:ChemicalSubstance 13091 833
LOINC:MTHU000096 Microbiology biolink:Procedure 12785 1

chunyuma avatar Apr 06 '21 19:04 chunyuma

Here’s an oddball idea: if a bioentity never shows up in any pubmed abstract, it’s probably not “too important.” Wouldn’t get rid of terms like “Microbiology” and “brain”, but would things like “ 1-palmitoyl-2-(3-trans)-hexadecenoyl-sn-glycer...” And just a side note: I think some care will be needed for the generic terms. I have seen SME queries that ask things like “which genes are expressed in the liver?” So we would want that generic term.

dkoslicki avatar Apr 06 '21 21:04 dkoslicki

That is an interesting question for the FastNGDers (@finnagin @amykglen ?) of the 6.1 million nodes in KG2.5.2C, how many have at least one PMID associated with it in our database? That alone may chop the list down substantially. Although probably not enough. One thing doesn't seem to make sense to me. KG2.5.2 has 10 millions nodes, while KG2.5.2C has 6 million nodes. Not a big drop. Yet, nearly every concept in KG2C that I've cared about has had at least a dozen nodes in the cluster. So this suggests that there are millions of nodes that probably have no friends and I wonder if they're useful.

As an example, I do notice that we have 1.78 million nodes that are just NCBITaxons. I wonder if this is really a useful thing. I wonder if we could remove 1.77 million NCBITaxon nodes without sacrificing any practical query capability..

edeutsch avatar Apr 06 '21 22:04 edeutsch

yeah, I believe only 1.6 million KG2c nodes have one or more PMIDs in the fast NGD database. helps quite a bit for sure, though 1.6m * 1.6m is probably still too much. :)

(and indeed I think the majority of nodes in KG2c are almost never returned in ARAX queries. for example, it's by far the nodes with PMIDs that happen to be returned in ARAX queries; that's why the fastNGD 'hit rate' is in the 99% range, even though only a quarter of the KG2c nodes have any PMIDs in the fastNGD database.)

amykglen avatar Apr 06 '21 22:04 amykglen

I have some fanatical programming friends who insist that the smallest possible program that can still do the job is the best one. I wonder if some element of this ethos can be applied to KG2C? What is the smallest possible number of nodes we can have without sacrificing much at all?

edeutsch avatar Apr 06 '21 22:04 edeutsch

good question. :) I believe @timsyoon found that there are about 1.9 million isolated nodes in KG2c. those will of course never be returned in Translator queries, since they're not connected to anything. that's a good chunk right there we could probably get rid of with zero impact!

amykglen avatar Apr 07 '21 00:04 amykglen

of course on the flip side, one could argue that those are exactly the kind of nodes that we want to look for edges for. So that they become connected!

Just not the ones that are "ibuprofen 21 mg", "ibuprofen 37 mg" etc.

edeutsch avatar Apr 07 '21 00:04 edeutsch

1.6M^2 = 2.56 trillion shouldn't be too much ;) if we start with those at least and keep track of the hit rate, I think that could work. Just need more silicon to throw at the problem

dkoslicki avatar Apr 07 '21 01:04 dkoslicki

1.6M^2 = 2.56 trillion shouldn't be too much

@dkoslicki, based on my investigation, running 2.56 trillion in parallel in our server probably needs 139 days and even more under the situation which doesn't affect other users' jobs. We might need more computational resources.

chunyuma avatar Jun 05 '21 22:06 chunyuma

NCATS has provided us funds to do such large scale computations (and thankfully, as opposed to DTD, this database will rarely if ever need to be updated, and even then, the whole thing will not need to be updated, just new entries).

Let me know approximately how many core hours this would take, and I can see what ACI can do for us.

dkoslicki avatar Jun 05 '21 22:06 dkoslicki

@dkoslicki, I basically used the same approach as what I did for building DTD probability database. For each of 16M nodes, I submitted a job for calculating the ngd score between this node and all other 16M nodes. Each job uses only one process by using the map function in python (For some reasons, I found that using map function runs even faster than multiprocessing.). For each job, it consumes:

User time (seconds): 99.06
System time (seconds): 17.46
Percent of CPU this job got: 107%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:48.18
Maximum resident set size (kbytes): 17342956 (~17GB)

So I can only run around 25 jobs each time which consumes around 400 - 500 GB RAM. Theoretically, each job only uses one core and around 17GB but if running multiple jobs at the same time in the same server might affect each other. So I think it would be better if we can get some computational resources from ACI which can automatically assign different jobs to different cores which can afford ~17GB RAM.

@dkoslicki, if you remember, previously we did purchase some virtual cluster from ACI, but until now they can't help us resolve the job allocation problem, which means that we can't submit too many jobs at the same time. Let‘s say if we submit 1000 jobs at the same time, it might causes some problems regarding the job allocations for the vcore.

chunyuma avatar Jun 05 '21 23:06 chunyuma

Let's assume each job doesn't affect each other, each job might cost around 2 minutes for calculating the ngd score between one curie and other 16M nodes. Since we totally have 1,672,684 nodes, we can finish all computations around a week (1672684/ (30 times/per hour x 24 hours/per day x 300) = 7.74 days) if we can submit 300 jobs at the same time.

chunyuma avatar Jun 05 '21 23:06 chunyuma

I'm thinking it would be sensible to try a small-scale experiment to see if the approach yields useful results before starting thousands of hours of computation . Is there already a pilot? Perhaps run just the MONDOs against all 20k Swiss-Prot reviewed proteins? Can we reproduce some known connections? Can we generate some plausible new ones that we would want to report?

edeutsch avatar Jun 06 '21 07:06 edeutsch

Perhaps run just the MONDOs against all 20k Swiss-Prot reviewed proteins?

Great idea, @edeutsch. I can have a try.

chunyuma avatar Jun 06 '21 15:06 chunyuma

Can we reproduce some known connections? Can we generate some plausible new ones that we would want to report?

Hi @edeutsch, I have already computed the ngd scores of all MONDOs against all 20k Swiss-Prot reviewed proteins. How can we know if it can reproduce some known connections? Or generate some plausible new ones? Is there a threshold to filter them for checking some known connections?

chunyuma avatar Jun 07 '21 17:06 chunyuma

Can you post a plot and some summary statistics of the NGDs that you calculated @chunyuma? That will help in determining what constitutes a meaningfully "small" NGD score

dkoslicki avatar Jun 07 '21 17:06 dkoslicki

Agreed, and I also think that looking at a few examples would be useful.

Example 1: MONDO:0013989

{
   "edges": {
      "e00": {
         "subject":   "n00",
         "object":    "n01"
      }
   },
   "nodes": {
      "n00": {
         "ids":        ["MONDO:0013989"]
      },
      "n01": {
         "categories":  ["biolink:Protein"]
      }
   }
}

Current ARAX is returning 69 results, but only the top 5 have NGDs. The rest, no NGDs. What are the top 50 proteins for MONDO:0013989 based on your calculation? Do they overlap with the current answer?

  1. Example of a current case where we have nothing: MONDO:0014001:
{
   "edges": {
      "e00": {
         "subject":   "n00",
         "object":    "n01"
      }
   },
   "nodes": {
      "n00": {
         "ids":        ["MONDO:0014001"]
      },
      "n01": {
         "categories":  ["biolink:Protein"]
      }
   }
}

This returns nothing. What are the top 50 NGD links from your computation? Are there any?

edeutsch avatar Jun 07 '21 17:06 edeutsch

Thanks @dkoslicki and @edeutsch. Based on curie_to_pmids_v1.0_KG2.6.3.sqlite database, there are total 22,464 UniProKB proteins and 13,689 MONDO curies.

Here are statistics of the NGDs calculation: Only 11,743 MONDO curies have at least one valid ngd score. Only 20,467 UniProKB curies have at least one valid ngd score.

count 4.156227e+07 mean 3.728171e-01 std 1.346087e-01 min 2.382131e-03 25% 2.755543e-01 50% 3.507656e-01 75% 4.456183e-01 max 1.204312e+00

Here is the distribution of all NGD scores for all MONDOs against all 20k Swiss-Prot reviewed proteins Screen Shot 2021-06-07 at 2 16 35 PM

For example 1: MONDO:0013989, here are the top 50 proteins:

MONDO protein ngd_score
MONDO:0013989 UniProtKB:Q6UVM3 0.135096
MONDO:0013989 UniProtKB:Q15822 0.139354
MONDO:0013989 UniProtKB:Q9H936 0.156994
MONDO:0013989 UniProtKB:P17787 0.161801
MONDO:0013989 UniProtKB:Q9P2E7 0.175193
MONDO:0013989 UniProtKB:O43526 0.180483
MONDO:0013989 UniProtKB:O76039 0.184613
MONDO:0013989 UniProtKB:O43307 0.203182
MONDO:0013989 UniProtKB:P61764 0.203552
MONDO:0013989 UniProtKB:Q86Y07 0.207103
MONDO:0013989 UniProtKB:Q07699 0.215191
MONDO:0013989 UniProtKB:Q8N7X2 0.215684
MONDO:0013989 UniProtKB:Q96MP8 0.216447
MONDO:0013989 UniProtKB:Q5RIA9 0.218315
MONDO:0013989 UniProtKB:Q9H1X3 0.218315
MONDO:0013989 UniProtKB:Q9H2S1 0.218374
MONDO:0013989 UniProtKB:Q9P2G4 0.220513
MONDO:0013989 UniProtKB:Q96H35 0.220513
MONDO:0013989 UniProtKB:Q96MA6 0.222343
MONDO:0013989 UniProtKB:Q13303 0.223599
MONDO:0013989 UniProtKB:Q9NX38 0.223663
MONDO:0013989 UniProtKB:Q9BS92 0.224072
MONDO:0013989 UniProtKB:Q5VVW2 0.224072
MONDO:0013989 UniProtKB:Q3KQV9 0.224072
MONDO:0013989 UniProtKB:Q5JVG2 0.224072
MONDO:0013989 UniProtKB:Q96LW7 0.224072
MONDO:0013989 UniProtKB:Q86W47 0.225216
MONDO:0013989 UniProtKB:Q8NBV4 0.225563
MONDO:0013989 UniProtKB:Q5THR3 0.225563
MONDO:0013989 UniProtKB:O75121 0.225563
MONDO:0013989 UniProtKB:Q5VXU9 0.225563
MONDO:0013989 UniProtKB:Q6ZW05 0.226913
MONDO:0013989 UniProtKB:Q14929 0.226913
MONDO:0013989 UniProtKB:Q8N228 0.226913
MONDO:0013989 UniProtKB:Q96K62 0.226913
MONDO:0013989 UniProtKB:A2A3K4 0.226913
MONDO:0013989 UniProtKB:Q9P2F6 0.226913
MONDO:0013989 UniProtKB:Q6NUM6 0.226913
MONDO:0013989 UniProtKB:Q56UQ5 0.227683
MONDO:0013989 UniProtKB:Q8N0Z9 0.227683
MONDO:0013989 UniProtKB:Q6ZMW2 0.227683
MONDO:0013989 UniProtKB:Q96NJ1 0.227683
MONDO:0013989 UniProtKB:Q8NFD4 0.227683
MONDO:0013989 UniProtKB:Q6P2C0 0.227683
MONDO:0013989 UniProtKB:Q5T011 0.227914
MONDO:0013989 UniProtKB:Q5VTE6 0.228149
MONDO:0013989 UniProtKB:Q6PF06 0.228149
MONDO:0013989 UniProtKB:Q8N4T4 0.228149
MONDO:0013989 UniProtKB:Q9Y2H8 0.228149
MONDO:0013989 UniProtKB:Q6ZSA7 0.228149

For those top 5 with NGD returned by ARAX, only UniProtKB:Q6UVM3 is matched. For some reasons, UniProtKB:P78508 and UniProtKB:Q9NS40 are not in curie_to_pmids_v1.0_KG2.6.3.sqlite database. I guess probably ARAX is still using the old version of kg2 rather than 2.6.3.

For example 2: MONDO:0014001, it also doesn't have any ngd scores with any proteins. ARAX also reports an error No paths were found in {'BTE', 'RTX-KG2'} satisfying qedge e00 when I ran:

{
   "edges": {
      "e00": {
         "subject":   "n00",
         "object":    "n01"
      }
   },
   "nodes": {
      "n00": {
         "ids":        ["MONDO:0014001"]
      },
      "n01": {
         "categories":  ["biolink:Protein"]
      }
   }
}

chunyuma avatar Jun 07 '21 18:06 chunyuma

@chunyuma would you generate the histogram with 0.01 NGD score resolution?

edeutsch avatar Jun 07 '21 18:06 edeutsch

@edeutsch, here is the histogram with 0.01 resolution:

Screen Shot 2021-06-07 at 2 42 55 PM

chunyuma avatar Jun 07 '21 18:06 chunyuma

For some reasons, UniProtKB:P78508 and UniProtKB:Q9NS40 are not in curie_to_pmids_v1.0_KG2.6.3.sqlite database. I guess probably ARAX is still using the old version of kg2 rather than 2.6.3.

ARAX is still using 2.5.2 since there are still too many issues with the 2.6.x series to deploy I think.

but I'm concerned about P78508. Are you saying that P78508 is not in KG2.6.3? Or there are no PMIDs associated with it?

Either way, this seems concerning and something we should follow up on? P78508 is a classic reviewed UniProtKB/Swiss-Prot protein, available since 1997 with many publications associated with it in UniProtKB. If we lost it, we should figure out why.

https://www.uniprot.org/uniprot/P78508

edeutsch avatar Jun 07 '21 18:06 edeutsch

Are you saying that P78508 is not in KG2.6.3? Or there are no PMIDs associated with it?

I think v2.6.3 Nodesynonymizer clustered UniProtKB:P78508 with MONDO:0010134. And it seems like MONDO:0010134 also doesn't have PMIDs.

n.id n.category n.equivalent_curies n.publications
"MONDO:0010134" "biolink:Disease" ["CHEMBL.TARGET:CHEMBL2146348", "DOID:0060744", "ENSEMBL:ENSG00000091137", "ENSEMBL:ENSG00000168269", "ENSEMBL:ENSG00000177807", "HGNC:3815", "HGNC:6256", "HGNC:8818", "LOINC:LP35578-1", "MEDDRA:10080398", "MESH:C536648", "MONDO:0010134", "NCBIGene:2299", "NCBIGene:3766", "NCBIGene:5172", "NCIT:C121745", "OMIM:274600", "OMIM:601093", "OMIM:602208", "OMIM:605646", "ORPHANET:231422", "ORPHANET:705", "PR:000001979", "PR:000007625", "PR:P78508", "PR:Q12951", "REACT:R-HSA-425403", "REACT:R-HSA-5627850", "REACT:R-HSA-5627857", "REACT:R-HSA-5627860", "REACT:R-HSA-5627865", "REACT:R-HSA-5627873", "REACT:R-HSA-975290", "SNOMED:70348004", "UMLS:C0271829", "UMLS:C1414682", "UMLS:C1416577", "UMLS:C1418445", "UMLS:C3551785", "UniProtKB:O43511", "UniProtKB:P78508", "UniProtKB:Q12951"] ["2-r", "DOI:10.1001/jamaoto.2013.4185", "DOI:10.1002/(sici)1096-8628(20000103)90:1<38::aid-ajmg8>3.0.co", "DOI:10.1002/ajmg.a.20272", "DOI:10.1002/humu.1116", "DOI:10.1002/humu.1238", "DOI:10.1002/humu.20884", "DOI:10.1002/humu.23335", "DOI:10.1002/humu.9043", "DOI:10.1002/j.1460-2075.1994.tb06827.x"]

chunyuma avatar Jun 07 '21 18:06 chunyuma

hmm, I suggest doing your experiment with KG2.5.2 because otherwise we will keep bumping into these KG2.6.x problems when we try to poke a little deeper. and it will be hard to compare what ARAX can currently produce to understand if we're getting an improvement.

edeutsch avatar Jun 07 '21 19:06 edeutsch

ok, I can do it and should have results tomorrow or the day after tomorrow.

chunyuma avatar Jun 07 '21 19:06 chunyuma