TDC icon indicating copy to clipboard operation
TDC copied to clipboard

Improvements in the context of knowledge graph resources

Open abearab opened this issue 1 year ago • 10 comments

Suggesting a new data function: Knowledge Graph Mastery, https://github.com/mims-harvard/TDC/discussions/23#discussioncomment-7364648

Changes:

  • Add knowledge_graph module
  • Introduce KnowledgeGraph python class
  • Include build_KG function to enable creating graph from scratch
  • Add to_KG method to current primekg in resource module to get PrimeKG as KnowledgeGraph object

Features:

  • Generalized format to create, handle, and manipulate KnowledgeGraph datasets (mimiking the PrimeKG data frame)
  • Enhance maintenance and data integration for future knowledge graphs , e.g. https://github.com/mims-harvard/PrimeKG/pull/12

abearab avatar Aug 04 '23 08:08 abearab

Hi @abearab thanks for the PR! I think this looks like a lightweight approach to accommodate all the future KGs that could be added to the TDC. I have two comments:

  1. The KG class should be probably defined under the tdc/utils folder. Since it is a data function.
  2. Is it possible to modify PrimeKG to inherit from the KG class you defined? This way all the changes made in the KG class can also be applied to PrimeKG.

Otherwise, looks great - happy to discuss more about these two comments if you have any question!

kexinhuang12345 avatar Oct 30 '23 01:10 kexinhuang12345

Thanks @kexinhuang12345

The KG class should be probably defined under the tdc/utils folder. Since it is a data function.

done – https://github.com/mims-harvard/TDC/pull/207/commits/5eee5a6d974f3680920a9073024513dc8ae7b960

Is it possible to modify PrimeKG to inherit from the KG class you defined? This way all the changes made in the KG class can also be applied to PrimeKG.

Yeah, I agree. I used a method option for the current PrimeKG class for converting it to KnowledgeGraph class. I just removed that so the .KG will keep the data for PrimeKG and inherits from the KG class by default. Thanks for the suggestion. see https://github.com/mims-harvard/TDC/pull/207/commits/354cd50c98b5e795be5ca205570b5b6bd844966d

abearab avatar Oct 30 '23 05:10 abearab

hi. can you run the new conda-test GH Action tests? They should be run if you create a new pull request. Also looks like your branch has conflicts. You should resolve those. Closing this for now.

amva13 avatar Mar 05 '24 22:03 amva13

hey @amva13 – thanks for your response.

I got this error, any thoughts?

OK (skipped=1)
/home/runner/work/TDC/TDC
All base tests passed
--- tdc/resource/primekg.py	(original)
+++ tdc/resource/primekg.py	(reformatted)
@@ -12,6 +12,7 @@
 from ..utils.knowledge_graph import KnowledgeGraph
 
 warnings.filterwarnings("ignore")
+
 
 class PrimeKG:
     """PrimeKG data loader class to load the knowledge graph with additional support functions.
@@ -30,9 +31,9 @@
 
         G = nx.Graph()
         for i in self.KG.df.relation.unique():
-            G.add_edges_from(
-                self.KG.df[self.df.relation == i][["x_id", "y_id"]].values, relation=i
-            )
+            G.add_edges_from(self.KG.df[self.df.relation == i][["x_id",
+                                                                "y_id"]].values,
+                             relation=i)
         return G
 
     def get_features(self, feature_type):
@@ -43,7 +44,5 @@
 
     def get_node_list(self, node_type):
         df = self.KG.df
-        return np.unique(
-            df[(df.x_type == node_type)].x_id.unique().tolist()
-            + df[(df.y_type == node_type)].y_id.unique().tolist()
-        )
+        return np.unique(df[(df.x_type == node_type)].x_id.unique().tolist() +
+                         df[(df.y_type == node_type)].y_id.unique().tolist())
--- tdc/utils/knowledge_graph.py	(original)
+++ tdc/utils/knowledge_graph.py	(reformatted)
@@ -5,13 +5,13 @@
 from copy import copy
 
 kg_columns = [
-    'relation', 'display_relation',
-    'x_id', 'x_type', 'x_name', 'x_source',
+    'relation', 'display_relation', 'x_id', 'x_type', 'x_name', 'x_source',
     'y_id', 'y_type', 'y_name', 'y_source'
 ]
 
 
 class KnowledgeGraph:
+
     def __init__(self, df=None):
         if df is not None:
             self.df = df
@@ -29,26 +29,32 @@
     def get_nodes_by_source(self, source):
         # extract x nodes
         x_df = self.df.query(
-            f"x_source == '{source}' | y_source == '{source}'"
-        )[[col for col in self.df.columns if col.startswith("x_")]]
+            f"x_source == '{source}' | y_source == '{source}'")[[
+                col for col in self.df.columns if col.startswith("x_")
+            ]]
 
         for col in x_df.columns:
             x_df = x_df.rename(columns={col: col[2:]})
 
         # extract y nodes
         y_df = self.df.query(
-            f"x_source == '{source}' | y_source == '{source}'"
-        )[[col for col in self.df.columns if col.startswith("y_")]]
+            f"x_source == '{source}' | y_source == '{source}'")[[
+                col for col in self.df.columns if col.startswith("y_")
+            ]]
         for col in y_df.columns:
             y_df = y_df.rename(columns={col: col[2:]})
         # merge x and y nodes and keep only unique nodes
-        out = pd.concat([x_df, y_df], axis=0).query(f'source == "{source}"').drop_duplicates().reset_index(drop=True)
+        out = pd.concat([
+            x_df, y_df
+        ], axis=0).query(f'source == "{source}"').drop_duplicates().reset_index(
+            drop=True)
 
         return out
 
 
-def build_KG(indices, relation, display_relation, x_id, x_type, x_name, x_source, y_id, y_type, y_name, y_source):
-    df = pd.DataFrame('',columns=kg_columns,index=indices)
+def build_KG(indices, relation, display_relation, x_id, x_type, x_name,
+             x_source, y_id, y_type, y_name, y_source):
+    df = pd.DataFrame('', columns=kg_columns, index=indices)
 
     df.relation = relation
     df.display_relation = display_relation
Error: Process completed with exit code 1.

abearab avatar Mar 05 '24 22:03 abearab

Yes this is an error from the linter. It's from your code's formatting. In theory, you should be able to fix by simply replacing the contents of your file with the output from yapf --style=google tdc/utils/knowledge_graph.py for example. here's more info on the linter if helpful https://github.com/google/yapf

amva13 avatar Mar 05 '24 23:03 amva13

your tdc/resource/primekg.py file is also failing so make sure to do the same there.

amva13 avatar Mar 05 '24 23:03 amva13

reopening

amva13 avatar Mar 05 '24 23:03 amva13

Yes this is an error from the linter. It's from your code's formatting. In theory, you should be able to fix by simply replacing the contents of your file with the output from yapf --style=google tdc/utils/knowledge_graph.py for example. here's more info on the linter if helpful https://github.com/google/yapf

Cool, I fixed it.

abearab avatar Mar 05 '24 23:03 abearab

running tests

amva13 avatar Mar 05 '24 23:03 amva13

looks good @abearab will complete review this week.

amva13 avatar Mar 05 '24 23:03 amva13

Hey @abearab . From the looks of it you should be able to separate your edits to PrimeKG into a different class. Can you go ahead and do that? It can inherit from existing PrimeKG if useful. As there are no dependencies on PrimeKG in the rest of the repo, you and anyone wishing to leverage the code you implemented in KnowledgeGraph should be able to use your class directly. As there is not yet extensive testing on PrimeKG, I'd prefer we proceed in this manner. I can approve when this is done.

amva13 avatar Mar 07 '24 18:03 amva13

Hi @amva13 – we have a discussion with @kexinhuang12345 here https://github.com/mims-harvard/TDC/pull/207#issuecomment-1784357449 which motivated me to modify PrimeKG class. This is what Kexin suggested:

Is it possible to modify PrimeKG to inherit from the KG class you defined? This way all the changes made in the KG class can also be applied to PrimeKG.

I'm not sure if I understand your concern, can you clarify please? I'll be more than happy to add a few more commits if needed.

abearab avatar Mar 07 '24 19:03 abearab

Yes, i think it would be great to create a separate KG class and PrimeKG can inherit from it. The reason is that in the future if there is other KG, then we can inherit from that class and reuse it instead of reimplementing the KG loading/processing/analysis shared functions. Let me know if that makes sense!

kexinhuang12345 avatar Mar 07 '24 20:03 kexinhuang12345

@abearab your implementation does not currently have PrimeKG inheriting from KnowledgeGraph/KG (i.e. would look like PrimeKG(KG) for example)

Regardless - please do not change the code in PrimeKG. Please create a separate class, say, PrimeKGDev. There is no need for you to inherit from KG. You can use the exact same code you are using for PrimeKG and copy that over to PrimeKGDev.

PrimeKGDev should offer exactly the same interface as PrimeKG (this should not be difficult if you just copy over the code).

For conflict resolution, I am currently the authority on TDC, but you may ask @marinkaz for clarification if needed.

@kexinhuang12345 please feel free to implement testing on PrimeKG and we can consolidate the two classes, say by changing PrimeKG to inherit directly from PrimeKGDev, accordingly.

Thanks!

amva13 avatar Mar 07 '24 21:03 amva13

@ abearab your implementation does not currently have PrimeKG inheriting from KnowledgeGraph/KG (i.e. would look like PrimeKG(KG) for example)

That's true, now I fixed it. Thanks for the suggestion.

Regardless - please do not change the code in PrimeKG. Please create a separate class, say, PrimeKGDev. There is no need for you to inherit from KG. You can use the exact same code you are using for PrimeKG and copy that over to PrimeKGDev.

PrimeKGDev should offer exactly the same interface as PrimeKG (this should not be difficult if you just copy over the code).

I'm not sure if I understood your points for having both PrimeKGDev and PrimeKG. Can you clarify, please? Or if it's easier for you, maybe you can add that yourself? Let me know what you think.

For conflict resolution, I am currently the authority on TDC, but you may ask @marinkaz for clarification if needed.

Sounds great, I don't think any clarification is needed in this regard. I just referred to our prior conversations with @kexinhuang12345 to make sure we are all on the same page. Looking forward to seeing all upcoming features in TDC!

abearab avatar Mar 08 '24 23:03 abearab

@abearab feel free to merge this at your convenience. lgtm

amva13 avatar Mar 12 '24 16:03 amva13