Relphormer icon indicating copy to clipboard operation
Relphormer copied to clipboard

Error: masked_head_neighbor.txt not found - How to reproduce recommendation task

Open devinbost opened this issue 11 months ago • 5 comments

I love how Relphormer utilizes text in addition to graph information. I would like to train a model using Relphormer that reproduces your results for the recommendation task. However, I am hitting an error.

Here is a notebook that reproduces the problem: https://colab.research.google.com/drive/1PHGALZo6AkimU5jEq0I__lnoJaWxUhgg?usp=sharing (Run in a T4 GPU runtime.)

It fails at this step: trainer.fit(lit_model, datamodule=data) with the following stake trace:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
[<ipython-input-7-aea1f0bf8188>](https://localhost:8080/#) in <cell line: 14>()
     12 )
     13 
---> 14 trainer.fit(lit_model, datamodule=data)
     15 
     16 # Explore lit_model and its hidden states or proceed with further actions as needed

10 frames
[/content/Relphormer/data/processor.py](https://localhost:8080/#) in _create_examples(self, lines, set_type, data_dir, args)
    435         '''
    436         if not args.pretrain:
--> 437             with open(os.path.join(data_dir, "masked_head_neighbor.txt"), 'r') as file:
    438                 masked_head_neighbor = json.load(file)
    439             with open(os.path.join(data_dir, "masked_tail_neighbor.txt"), 'r') as file:

FileNotFoundError: [Errno 2] No such file or directory: '/content/Relphormer/dataset/fb15k-237/masked_head_neighbor.txt'

I dug into where this masked_head_neighbor.txt file is loaded and written. This is where I am confused.

  1. The stack trace is trying to read the file from processor.py (https://github.com/devinbost/Relphormer/blob/feddb75f4e6e344d3daa83d8048127c3537b2baa/data/processor.py#L436):
        if not args.pretrain:
            with open(os.path.join(data_dir, "masked_head_neighbor.txt"), 'r') as file:
                masked_head_neighbor = json.load(file)
            with open(os.path.join(data_dir, "masked_tail_neighbor.txt"), 'r') as file:
                masked_tail_neighbor = json.load(file)
            print(f'\n \t Not pre-training stage, masked subgraphs loaded.')
        else:
            masked_head_neighbor = []
            masked_tail_neighbor = []

However, even if I set args.pretrain = None, it still appears to be trying to read the "masked_head_neighbor.txt" file.

  1. On the write path, the files is written at the global scope in create_neighbor.py (https://github.com/devinbost/Relphormer/blob/10e2fbfcf8a2aed0d9b50f6e896bf9892991b9f2/dataset/create_neighbor.py#L148). However, the only references to create_neighbor appear to be in the README: image

So, I have two questions:

  1. If I need this masked_head_neighbor.txt file, how do I create it?
  2. If I don't need this file, how can I skip it?

Thanks for all your great work on this project. I'm really looking forward to getting it working.

devinbost avatar Mar 09 '24 05:03 devinbost

Thank you for your interest in our work! We have noticed your issue, which likely stems from code inconsistencies during the version update of Relphormer. We will replicate and update the code for the recommendation task, but this may take some time (approximately three weeks). We apologize for not being able to immediately update the code, as we are currently in the process of training another language model.

bizhen46766 avatar Mar 09 '24 06:03 bizhen46766

I have some bandwidth to help. (My intent is to publish a demo that features Relphormer with vector search and broadcast it on our YouTube channel.)

What should be the expected behavior with the masked_head_neighbor.txt file?

devinbost avatar Mar 11 '24 12:03 devinbost

Hi, I'm not entirely sure if I've understood correctly. Do you want to use the Relphormer model to generate vector representations? If that's the case, I will include code for automated retrieval of vector representations in the next version update.

bizhen46766 avatar Mar 11 '24 16:03 bizhen46766

Yes, I want to use it to generate vector representations. I have an example here involving node2vec: https://colab.research.google.com/drive/1gmop95YgsCvoAmOVPLnxcLhkn_Os0ude I'd like to produce something like this. I think the example will help you understand what I'm trying to do. With that said, I'm very interested in how different representation models affect the embedding space for vector search. (Vector search is very useful for getting models to scale in production when the embedding space works well for the given problem.)

devinbost avatar Mar 11 '24 19:03 devinbost

Thank you for your attention! I understand. We will update the code for the next version based on the node2vec examples you provided

bizhen46766 avatar Mar 15 '24 03:03 bizhen46766

Hello! Sorry for the late reply.

We have updated the script for quickly obtaining the vector representations after model training.

You can run the following script code to obtain the KG embeddings by modifying the hyperparameters.

sh getRelphormerVec.sh

We hope this helps you.

bizhen46766 avatar Jul 15 '24 14:07 bizhen46766

Hi, if you have any further questions, please contact us.

zxlzr avatar Jul 16 '24 04:07 zxlzr