OpenHGNN icon indicating copy to clipboard operation
OpenHGNN copied to clipboard

"dblp4HAN" dataset bug

Open luoxc007 opened this issue 2 years ago • 1 comments

🐛 Bug

When I ran "python -u /home/wj/dgl/OpenHGNN-main/main.py -m HAN -d dblp4HAN -t node_classification -g 6 --use_best_config --load_from_pretrained" with openhgnn, I got an error as "UnboundLocalError: local variable '_dataset' referenced before assignment".

To Reproduce

Steps to reproduce the behavior:

1.Just run as the command I shown above.

Expected behavior

  1. I traced the code and I found the source code was implemented with many "elif" to extinct the dataset name but without an "else" so when we input an invalid dataset name it will report an error about the variable but not the dataset name we inputted. So we can just add an "else" to improve the code error reports.
  2. And I found the "dblp4HAN" was introduced in the README.md file in openhgnn/dataset , but actually there is not such a dataset, so we can just modify this file?

Environment

  • OpenHGNN Version (e.g., 1.0):
  • Backend Library & Version (e.g., PyTorch 0.4.1, DGL 0.7.0):
  • OS (e.g., Linux):
  • Running command you used (e.g., python main.py -m GTN -d imdb4GTN -t node_classification -g 0 --use_best_config):
  • Model configuration you used (e.g., details of the model configuration you used in config.ini):
  • Python version: 3.8
  • CUDA/cuDNN version (if applicable):
  • GPU models and configuration (e.g. V100):
  • Any other relevant information:

Additional context

None.

luoxc007 avatar Oct 07 '22 05:10 luoxc007

dblp4HAN seems unavailable now and your suggestions sounds reasonable. Please use other datasets such as dblp4GTN, acm_han_raw, etc.

Zhanghyi avatar Oct 08 '22 03:10 Zhanghyi

dblp4HAN seems unavailable now and your suggestions sounds reasonable. Please use other datasets such as dblp4GTN, acm_han_raw, etc.

I want to use dblp4GTN, but i meet a problem. The dataset 'acm4GTN' has the metapath embedding, such as 'pspap_m2v_emb' and so on. But i can not find the metapath embedding for dblp4GTN. On the other hand, i can use the dataset 'acm4GTN' in HGSL, but the dataset 'dblp4GTN' can not use in HGSL because of the reason i have told.

wwddd66 avatar Mar 08 '23 15:03 wwddd66

dblp4HAN seems unavailable now and your suggestions sounds reasonable. Please use other datasets such as dblp4GTN, acm_han_raw, etc.

I want to use dblp4GTN, but i meet a problem. The dataset 'acm4GTN' has the metapath embedding, such as 'pspap_m2v_emb' and so on. But i can not find the metapath embedding for dblp4GTN. On the other hand, i can use the dataset 'acm4GTN' in HGSL, but the dataset 'dblp4GTN' can not use in HGSL because of the reason i have told.

You are right. metapath2vec embedding is not available for dblp4GTN, one solution is to run metapath2vec on the dataset to generate the embedding and then assign it as node feature before running HGSL.

Zhanghyi avatar Mar 08 '23 15:03 Zhanghyi

dblp4HAN seems unavailable now and your suggestions sounds reasonable. Please use other datasets such as dblp4GTN, acm_han_raw, etc.

I want to use dblp4GTN, but i meet a problem. The dataset 'acm4GTN' has the metapath embedding, such as 'pspap_m2v_emb' and so on. But i can not find the metapath embedding for dblp4GTN. On the other hand, i can use the dataset 'acm4GTN' in HGSL, but the dataset 'dblp4GTN' can not use in HGSL because of the reason i have told.

You are right. metapath2vec embedding is not available for dblp4GTN, one solution is to run metapath2vec on the dataset to generate the embedding and then assign it as node feature before running HGSL.

For example, you can modify the meta_path_key to APCPA in the config.ini file and then run the following command: python main.py -m Metapath2vec -t node_classification -d dblp4GTN -g -1 This will output the embeddings of authors in the output/metapath2vec directory.

Zhanghyi avatar Mar 08 '23 15:03 Zhanghyi

Yeah, I get the 'APCPA' embeddings but the embedding's shape is (18405,128) which includes all nodes in dblp4GTN. Can the Metapath2vec generate target type node embedding or how to get the target type node embeddings from all the nodes embedding? What is the node order in the 'APCPA' embeddings, eg. author is range(0, 4057), conference is range(4057,4077) and paper is range(4077, 18405)?

wwddd66 avatar Mar 09 '23 07:03 wwddd66

Yeah, I get the 'APCPA' embeddings but the embedding's shape is (18405,128) which includes all nodes in dblp4GTN. Can the Metapath2vec generate target type node embedding or how to get the target type node embeddings from all the nodes embedding? What is the node order in the 'APCPA' embeddings, eg. author is range(0, 4057), conference is range(4057,4077) and paper is range(4077, 18405)?

@Zhanghyi

wwddd66 avatar Mar 09 '23 07:03 wwddd66

Yeah, I get the 'APCPA' embeddings but the embedding's shape is (18405,128) which includes all nodes in dblp4GTN. Can the Metapath2vec generate target type node embedding or how to get the target type node embeddings from all the nodes embedding? What is the node order in the 'APCPA' embeddings, eg. author is range(0, 4057), conference is range(4057,4077) and paper is range(4077, 18405)?

@Zhanghyi

The embedding file contains all node types in the same order as g.ntypes. We will update the relevant documentation for clarity.

Zhanghyi avatar Mar 09 '23 07:03 Zhanghyi

Yeah, I get the 'APCPA' embeddings but the embedding's shape is (18405,128) which includes all nodes in dblp4GTN. Can the Metapath2vec generate target type node embedding or how to get the target type node embeddings from all the nodes embedding? What is the node order in the 'APCPA' embeddings, eg. author is range(0, 4057), conference is range(4057,4077) and paper is range(4077, 18405)?

@Zhanghyi

The embedding file contains all node types in the same order as g.ntypes. We will update the relevant documentation for clarity.

Thanks!

wwddd66 avatar Mar 09 '23 07:03 wwddd66