Graphormer icon indicating copy to clipboard operation
Graphormer copied to clipboard

How to evaluate for dataset zinc?

Open dongZheX opened this issue 2 years ago • 9 comments

Thanks for the code. Good job. I train graphormer_slim in zinc dataset by bash examples/property_prediction/zinc.sh. Then I try to evalute it by:

python graphormer/evaluate/evaluate.py \  
--user-dir graphormer \  
--num-workers 16 \  
--ddp-backend=legacy_ddp \  
--dataset-name zinc \  
--dataset-source pyg \
--task graph_prediction \  
--criterion l1_loss \  
--arch graphormer_slim \  
--num-classes 1 \  
--batch-size 64 \  
--save-dir exp/checkpoints_dir/ckpts_zinc \  
--metric mae \  
--split test      

An error happen:

TypeError: mean() received an invalid combination of arguments - got (out=NoneType, dtype=NoneType, axis=NoneType, ), but expected one of:
 * (*, torch.dtype dtype)
 * (tuple of ints dim, bool keepdim, *, torch.dtype dtype)
 * (tuple of names dim, bool keepdim, *, torch.dtype dtype)

I change the code mae = np.mean(np.abs(y_true-y_pred)) to mae = torch.nn.functional.l1_loss(y_true, y_pred)

But got the mae in zinc dataset is :

2022-01-10 14:40:42 | INFO | graphormer.tasks.graph_prediction | Loaded test with #samples: 5000
2022-01-10 14:40:46 | INFO | __main__ | mae: 0.06235151365399361 

Is this result normal? I doubt that I make some mistakes.

dongZheX avatar Jan 10 '22 07:01 dongZheX

@dongZheX Thanks for using Graphormer. I'll fix the problem in the script.

Do you use the full ZINC dataset, or the subset as mentioned in Graphormer paper?

shiyu1994 avatar Jan 10 '22 15:01 shiyu1994

@dongZheX Thanks for using Graphormer. I'll fix the problem in the script.

Do you use the full ZINC dataset, or the subset as mentioned in Graphormer paper?

Thanks for replying very much. I think I use the subset of ZINC, the result of Graphormer in paper is 0.122±0.006. But the result i got is 0.0623, i think i make mistakes somewhere.

By the way, in benchmarking_gnns https://github.com/graphdeeplearning/benchmarking-gnns: we need to train the model in seeds {41,12,35,92}, do i have to train the graphormer fours times and evaluate them four times? is there any method to get the final result (mean result ± std) more effienctly. (same question for ogbg-molhiv)

dongZheX avatar Jan 10 '22 16:01 dongZheX

Thanks for the code. Good job. I train graphormer_slim in zinc dataset by bash examples/property_prediction/zinc.sh. Then I try to evalute it by:

python graphormer/evaluate/evaluate.py \  
--user-dir graphormer \  
--num-workers 16 \  
--ddp-backend=legacy_ddp \  
--dataset-name zinc \  
--dataset-source pyg \
--task graph_prediction \  
--criterion l1_loss \  
--arch graphormer_slim \  
--num-classes 1 \  
--batch-size 64 \  
--save-dir exp/checkpoints_dir/ckpts_zinc \  
--metric mae \  
--split test      

An error happen:

TypeError: mean() received an invalid combination of arguments - got (out=NoneType, dtype=NoneType, axis=NoneType, ), but expected one of:
 * (*, torch.dtype dtype)
 * (tuple of ints dim, bool keepdim, *, torch.dtype dtype)
 * (tuple of names dim, bool keepdim, *, torch.dtype dtype)

I change the code mae = np.mean(np.abs(y_true-y_pred)) to mae = torch.nn.functional.l1_loss(y_true, y_pred)

But got the mae in zinc dataset is :

2022-01-10 14:40:42 | INFO | graphormer.tasks.graph_prediction | Loaded test with #samples: 5000
2022-01-10 14:40:46 | INFO | __main__ | mae: 0.06235151365399361 

Is this result normal? I doubt that I make some mistakes.

Hi, how do you successfully run bash examples/property_prediction/zinc.sh? I ran this command for half an hour and got no response: image

skye95git avatar Mar 17 '22 08:03 skye95git

@dongZheX Did you download the data set in advance?

skye95git avatar Mar 17 '22 09:03 skye95git

@dongZheX Thanks for using Graphormer. I'll fix the problem in the script. Do you use the full ZINC dataset, or the subset as mentioned in Graphormer paper?

Thanks for replying very much. I think I use the subset of ZINC, the result of Graphormer in paper is 0.122±0.006. But the result i got is 0.0623, i think i make mistakes somewhere.

By the way, in benchmarking_gnns https://github.com/graphdeeplearning/benchmarking-gnns: we need to train the model in seeds {41,12,35,92}, do i have to train the graphormer fours times and evaluate them four times? is there any method to get the final result (mean result ± std) more effienctly. (same question for ogbg-molhiv)

@dongZheX How many epochs did you run. I use the zinc.sh file and run 10000 epochs(use ZINC subset dataset), but get undesirable result(the mae score is around 0.6) which leaves a large margin to the reported result.

ZhuYun97 avatar Apr 18 '22 06:04 ZhuYun97

I also reproduced the result of 0.06+, what is the reason for the inconsistency? I use the zinc.sh to train the model, and use the same test cmd as dongZheX used above.

2022-06-20 17:56:39 | INFO | __main__ | evaluating checkpoint file examples/property_prediction/ckpts/zinc_graphormer_slim/checkpoint50.pt
2022-06-20 17:56:40 | INFO | graphormer.models.graphormer | Namespace(no_progress_bar=False, log_interval=100, log_format=None, log_file=None, tens
orboard_logdir=None, wandb_project=None, azureml_logging=False, seed=1, cpu=False, tpu=False, bf16=False, memory_efficient_bf16=False, fp16=False,
memory_efficient_fp16=False, fp16_no_flatten_grads=False, fp16_init_scale=128, fp16_scale_window=None, fp16_scale_tolerance=0.0, on_cpu_convert_pre
cision=False, min_loss_scale=0.0001, threshold_loss_scale=None, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, user_dir
='graphormer', empty_cache_freq=0, all_gather_list_size=16384, model_parallel_size=1, quantization_config_path=None, profile=False, reset_logging=F
alse, suppress_crashes=False, use_plasma_view=False, plasma_path='/tmp/plasma', criterion='l1_loss', tokenizer=None, bpe=None, optimizer=None, lr_s
cheduler='fixed', scoring='bleu', task='graph_prediction', num_workers=16, skip_invalid_size_inputs_valid_test=False, max_tokens=None, batch_size=6
4, required_batch_size_multiple=8, required_seq_len_multiple=1, dataset_impl=None, data_buffer_size=10, train_subset='train', valid_subset='valid',
 combine_valid_subsets=None, ignore_unused_valid_subsets=False, validate_interval=1, validate_interval_updates=0, validate_after_updates=0, fixed_v
alidation_seed=None, disable_validation=False, max_tokens_valid=None, batch_size_valid=64, max_valid_steps=None, curriculum=0, gen_subset='test', n
um_shards=1, shard_id=0, grouped_shuffling=False, update_epoch_batch_itr=False, update_ordered_indices_seed=False, distributed_world_size=1, distri
buted_num_procs=1, distributed_rank=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, device_id=0, distributed_no_s
pawn=False, ddp_backend='legacy_ddp', ddp_comm_hook='none', bucket_cap_mb=25, fix_batches_to_gpus=False, find_unused_parameters=False, gradient_as_
bucket_view=False, fast_stat_sync=False, heartbeat_timeout=-1, broadcast_buffers=False, slowmo_momentum=None, slowmo_base_algorithm='localsgd', loc
alsgd_frequency=3, nprocs_per_node=1, pipeline_model_parallel=False, pipeline_balance=None, pipeline_devices=None, pipeline_chunks=0, pipeline_enco
der_balance=None, pipeline_encoder_devices=None, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_checkpoint='never', zero_sh
arding='none', no_reshard_after_forward=False, fp32_reduce_scatter=False, cpu_offload=False, use_sharded_state=False, not_fsdp_flatten_parameters=F
alse, arch='graphormer_slim', max_epoch=0, max_update=0, stop_time_hours=0, clip_norm=0.0, sentence_avg=False, update_freq=[1], lr=[0.25], stop_min
_lr=-1.0, use_bmuf=False, skip_remainder_batch=False, save_dir='examples/property_prediction/ckpts/zinc_graphormer_slim', restore_file='checkpoint_
last.pt', finetune_from_model=None, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, optimizer_override
s='{}', save_interval=1, save_interval_updates=0, keep_interval_updates=-1, keep_interval_updates_pattern=-1, keep_last_epochs=-1, keep_best_checkp
oints=-1, no_save=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_save_optimizer_state=False, best_checkpoint_metric='loss', maxim
ize_best_checkpoint_metric=False, patience=-1, checkpoint_suffix='', checkpoint_shard_count=1, load_checkpoint_on_all_dp_ranks=False, write_checkpo
ints_asynchronously=False, store_ema=False, ema_decay=0.9999, ema_start_update=0, ema_seed_model=None, ema_update_freq=1, ema_fp32=False, split='te
st', metric='mae', dataset_name='zinc', num_classes=1, max_nodes=128, dataset_source='pyg', num_atoms=4608, num_edges=1536, num_in_degree=512, num_
out_degree=512, num_spatial=512, num_edge_dis=128, multi_hop_max_dist=5, spatial_pos_max=1024, edge_type='multi_hop', pretrained_model_name='none',
 load_pretrained_model_output_layer=False, train_epoch_shuffle=False, user_data_dir='', force_anneal=None, lr_shrink=0.1, warmup_updates=0, pad=1,
eos=2, unk=3, no_seed_provided=False, encoder_embed_dim=80, encoder_layers=12, encoder_attention_heads=8, encoder_ffn_embed_dim=80, activation_fn='
gelu', encoder_normalize_before=True, apply_graphormer_init=True, share_encoder_input_output_embed=False, no_token_positional_embeddings=False, pre
_layernorm=False, dropout=0.1, attention_dropout=0.1, act_dropout=0.0, _name='graphormer_slim')
2022-06-20 17:56:41 | INFO | graphormer.tasks.graph_prediction | Loaded test with #samples: 5000
2022-06-20 17:56:44 | INFO | __main__ | mae: 0.0699552521109581

czczup avatar Jun 20 '22 10:06 czczup

https://github.com/pyg-team/pytorch_geometric/blob/97c50a03db9f5e9fbb0ab42d38681cac0d2a020a/torch_geometric/datasets/zinc.py#L64

Note that the ZINC dataset has both the full sets and the subset sets. The current version of our codes corresponds to the full sets. So there is a mismatch between your reproduced 0.069 mae with the results in our paper. You can specify it via this argument.

lsj2408 avatar Jun 23 '22 11:06 lsj2408

Note that the ZINC dataset has both the full sets and the subset sets. The current version of our codes corresponds to the full sets. So there is a mismatch between your reproduced 0.069 mae with the results in our paper. You can specify it via this argument.

Thanks for your reply~

czczup avatar Jun 23 '22 12:06 czczup

https://github.com/pyg-team/pytorch_geometric/blob/97c50a03db9f5e9fbb0ab42d38681cac0d2a020a/torch_geometric/datasets/zinc.py#L64

Note that the ZINC dataset has both the full sets and the subset sets. The current version of our codes corresponds to the full sets. So there is a mismatch between your reproduced 0.069 mae with the results in our paper. You can specify it via this argument.

Hello, just a quick question for clarification purposes - what is the modification required to be made in the zinc.sh file in order to train using the subset instead of the full set? Do I actually go to that zinc.py file and change the subset argument from False to True?

JiaYuanChng avatar Jun 28 '22 12:06 JiaYuanChng