OpenHGNN icon indicating copy to clipboard operation
OpenHGNN copied to clipboard

New user questions

Open felipemello1 opened this issue 3 years ago • 3 comments

Hi all, thanks for making this library available. I am trying to use it for my benchmarks, but I am having a bit of trouble.

I want to evaluate my own dataset for recommendation. In the website, there is an example only for node classification. I started to dig in the git repository and found an example for link_prediction under examples/customization.

I decided to settle for link_prediction, because I don't know what would be the equivalent to AsLinkPredictionDataset for recommendation.

I want to compute hits@k, but it is not clear where to change the metric, since I couldn't find the It as an input of AsLinkPredictionDataset, config.ini or OpenHGNN, so I have no idea how to change it.

In OGB benchmarks, they do the hits@k by providing a neg_df and a positive_df and comparing scores_pos > scores_neg. Maybe this could be part of the link_prediction pipeline to support hits@k?

I could also calculate the metric on my own if I could save the predictions, but it is not clear how to do inference or access the model after it is trained. I couldn't find it in the tutorials or examples.

In summary: It would be nice to have:

  1. Tutorial for recommendation system;
  2. How to change metrics;
  3. Can I calculate hits@K in OGB style, where I compare hits@K on a given neg set?
  4. How to save/load the trained model and do inference?

thanks very much!! Felipe

my code

import torch as th
from openhgnn.dataset import AsLinkPredictionDataset, generate_random_hg
from dgl import transforms as T
from dgl import DGLHeteroGraph
from dgl.data import DGLDataset
from dgl.dataloading.negative_sampler import GlobalUniform
import os
import numpy as np
meta_paths_dict ={}#{'APA': [('author', 'author-paper', 'paper'), ('paper', 'rev_author-paper', 'author')]}
target_link = [('DRUG', 'DRUG_DIS', 'DIS')]

class MySplitLPDatasetWithNegEdges(DGLDataset):
    def __init__(self):
        super().__init__(name='my-split-lp-dataset-with-neg-edges',
                         force_reload=True)

    def process(self):
        hg, neg_edges = np.load('pathtomydataset.npy'), allow_pickle=True)
        self._neg_val_edges, self._neg_test_edges = neg_edges['valid'], neg_edges['test']
        self._g = hg

    @property
    def neg_val_edges(self):
        return self._neg_val_edges

    @property
    def neg_test_edges(self):
        return self._neg_test_edges

    @property
    def meta_paths_dict(self):
        return meta_paths_dict

    def __getitem__(self, idx):
        return self._g

    def __len__(self):
        return 1


def train_with_custom_lp_dataset(dataset):
    from openhgnn.config import Config
    from openhgnn.start import OpenHGNN
    config_file = ["../../openhgnn/config.ini"]
    config = Config(file_path=config_file, model='RGCN', dataset=dataset, task='link_prediction', gpu=-1)
    OpenHGNN(args=config)

if __name__ == '__main__':
    mySplitLPDatasetWithNegEdges = AsLinkPredictionDataset(MySplitLPDatasetWithNegEdges(), target_link=target_link,
                                                           target_link_r=None,
                                                           force_reload=True)
    train_with_custom_lp_dataset(mySplitLPDatasetWithNegEdges)

felipemello1 avatar Jun 17 '22 20:06 felipemello1

Thank you for your comments.

  1. So far, we have only one model KGCN that can support recommendation system. We will consider it after the number of relevant models increases.
  2. We have not yet designed the relevant interface. If you do want to change, you can change trainerflow: self.task.get_evaluator() to change the metric.
  3. Still, we do not have the hits@k metric, but we have already implemented some knowledge graph models which contains hits@10 metric. I think this can be a reference. The models are here.
  4. The trained models are saved in /openhgnn/output/(model name), but our system do not support directly load models and do inference. Users may load models and perform downstream tasks themselves.

All right, we may consider these as our future plans for openhgnn. Thank you again.

dddg617 avatar Jun 24 '22 01:06 dddg617

Thanks for your reply @dddg617 !

Regarding your last point, I am afraid that your current pipeline doesn't save the models, at least not using the script under examples/customization. It only saves the logs.

I checked your code for parts doing something like torch.save and checkpoint, and apparently this is only called if early stopping happens. But I checked the code very briefly, so I might be wrong.

felipemello1 avatar Jun 24 '22 02:06 felipemello1

All right, for the last point, currently, we do not support saving models in examples/customization. But we support this in openhgnn/trainerflow. If you use our previous way to run the script, you will get the file .pt in openhgnn/output/{model name}. We will add the same function in examples/customization.

dddg617 avatar Jun 26 '22 01:06 dddg617