Relphormer
Relphormer copied to clipboard
Error: masked_head_neighbor.txt not found - How to reproduce recommendation task
I love how Relphormer utilizes text in addition to graph information. I would like to train a model using Relphormer that reproduces your results for the recommendation task. However, I am hitting an error.
Here is a notebook that reproduces the problem: https://colab.research.google.com/drive/1PHGALZo6AkimU5jEq0I__lnoJaWxUhgg?usp=sharing (Run in a T4 GPU runtime.)
It fails at this step:
trainer.fit(lit_model, datamodule=data)
with the following stake trace:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
[<ipython-input-7-aea1f0bf8188>](https://localhost:8080/#) in <cell line: 14>()
12 )
13
---> 14 trainer.fit(lit_model, datamodule=data)
15
16 # Explore lit_model and its hidden states or proceed with further actions as needed
10 frames
[/content/Relphormer/data/processor.py](https://localhost:8080/#) in _create_examples(self, lines, set_type, data_dir, args)
435 '''
436 if not args.pretrain:
--> 437 with open(os.path.join(data_dir, "masked_head_neighbor.txt"), 'r') as file:
438 masked_head_neighbor = json.load(file)
439 with open(os.path.join(data_dir, "masked_tail_neighbor.txt"), 'r') as file:
FileNotFoundError: [Errno 2] No such file or directory: '/content/Relphormer/dataset/fb15k-237/masked_head_neighbor.txt'
I dug into where this masked_head_neighbor.txt file is loaded and written. This is where I am confused.
- The stack trace is trying to read the file from processor.py (https://github.com/devinbost/Relphormer/blob/feddb75f4e6e344d3daa83d8048127c3537b2baa/data/processor.py#L436):
if not args.pretrain:
with open(os.path.join(data_dir, "masked_head_neighbor.txt"), 'r') as file:
masked_head_neighbor = json.load(file)
with open(os.path.join(data_dir, "masked_tail_neighbor.txt"), 'r') as file:
masked_tail_neighbor = json.load(file)
print(f'\n \t Not pre-training stage, masked subgraphs loaded.')
else:
masked_head_neighbor = []
masked_tail_neighbor = []
However, even if I set args.pretrain = None
, it still appears to be trying to read the "masked_head_neighbor.txt"
file.
- On the write path, the files is written at the global scope in create_neighbor.py (https://github.com/devinbost/Relphormer/blob/10e2fbfcf8a2aed0d9b50f6e896bf9892991b9f2/dataset/create_neighbor.py#L148). However, the only references to create_neighbor appear to be in the README:
So, I have two questions:
- If I need this
masked_head_neighbor.txt
file, how do I create it? - If I don't need this file, how can I skip it?
Thanks for all your great work on this project. I'm really looking forward to getting it working.
Thank you for your interest in our work! We have noticed your issue, which likely stems from code inconsistencies during the version update of Relphormer. We will replicate and update the code for the recommendation task, but this may take some time (approximately three weeks). We apologize for not being able to immediately update the code, as we are currently in the process of training another language model.
I have some bandwidth to help. (My intent is to publish a demo that features Relphormer with vector search and broadcast it on our YouTube channel.)
What should be the expected behavior with the masked_head_neighbor.txt
file?
Hi, I'm not entirely sure if I've understood correctly. Do you want to use the Relphormer model to generate vector representations? If that's the case, I will include code for automated retrieval of vector representations in the next version update.
Yes, I want to use it to generate vector representations. I have an example here involving node2vec: https://colab.research.google.com/drive/1gmop95YgsCvoAmOVPLnxcLhkn_Os0ude I'd like to produce something like this. I think the example will help you understand what I'm trying to do. With that said, I'm very interested in how different representation models affect the embedding space for vector search. (Vector search is very useful for getting models to scale in production when the embedding space works well for the given problem.)
Thank you for your attention! I understand. We will update the code for the next version based on the node2vec examples you provided
Hello! Sorry for the late reply.
We have updated the script for quickly obtaining the vector representations after model training.
You can run the following script code to obtain the KG embeddings by modifying the hyperparameters.
sh getRelphormerVec.sh
We hope this helps you.
Hi, if you have any further questions, please contact us.