CRSLab
CRSLab copied to clipboard
KGSF Performance on ReDial dastset
Hi, thanks for sharing such a great project.
I have run a benchmark for the ReDial dataset using KGSF. However, I got worse results compared to the original paper
This is the cmd I used and all configurations are set to default.
python run_crslab.py --config config/crs/kgsf/redial.yaml --gpu 0 --save_data --save_system --tensorboard --restore_data
I notice that the default parameters are different to the original paper:
-
batch_size
: 32 -> 128 -
epochs
for training recommendation module: 30 -> 9
Are there any suggested parameters to reproduce the results? I found that using a batch of 32 is extremely slow and I did a batch of 256 which leads to worse results.
Results Log with default setting
KG
2022-05-30 17:08:13.563 | INFO | crslab.system.kgsf:pretrain:120 - [Pretrain epoch 2]
2022-05-30 17:08:13.578 | INFO | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-05-30 17:09:51.380 | INFO | crslab.evaluator.standard:report:98 -
grad norm info_loss
1.479 .4573
Recommendation
Results from the paper:
- R@1: 0.039,
- R@10: 0.183
- R@50: 0.378
CRSLab:
2022-05-30 17:18:57.713 | INFO | crslab.system.kgsf:train_recommender:147 - [Test]
2022-05-30 17:18:57.861 | INFO | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-05-30 17:18:59.518 | INFO | crslab.evaluator.standard:report:98 -
hit@1 hit@10 hit@50 info_loss mrr@1 mrr@10 mrr@50 ndcg@1 ndcg@10 ndcg@50 rec_loss
.03522 .1774 .3687 .7471 .03522 .07128 .08048 .03522 .09601 .1385 8.069
Conversation
Results from the paper:
- Dist-2: 0.289,
- Dist-3: 0.434
- Dist-4: 0.519
CRSLab:
2022-05-30 18:44:43.492 | INFO | crslab.system.kgsf:train_conversation:176 - [Test]
2022-05-30 18:44:43.500 | INFO | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-05-30 18:45:06.845 | INFO | crslab.evaluator.standard:report:98 -
average bleu@1 bleu@2 bleu@3 bleu@4 dist@1 dist@2 dist@3 dist@4 extreme f1 greedy
.7300 .1671 .03262 .01538 .009669 .01072 .1129 .6729 1.855 .4991 .2173 .5993z
I ran the parameters (batch size
: 32 and epochs
for recommender: 30) from the original paper but was still not able to get similar results.
Recommendation
2022-06-01 11:15:52.359 | INFO | crslab.system.kgsf:train_recommender:147 - [Test]
2022-06-01 11:15:52.496 | INFO | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-06-01 11:15:56.330 | INFO | crslab.evaluator.standard:report:98 -
hit@1 hit@10 hit@50 info_loss mrr@1 mrr@10 mrr@50 ndcg@1 ndcg@10 ndcg@50 rec_loss
.03248 .1334 .3003 .5179 .03248 .0576 .06525 .03248 .0752 .1117 11.1
Conversation
2022-06-01 14:05:02.700 | INFO | crslab.system.kgsf:train_conversation:176 - [Test]
2022-06-01 14:05:02.706 | INFO | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-06-01 14:05:39.934 | INFO | crslab.evaluator.standard:report:98 -
average bleu@1 bleu@2 bleu@3 bleu@4 dist@1 dist@2 dist@3 dist@4 extreme f1 greedy
.7089 .1678 .03474 .01939 .01344 .01002 .0951 .5500 1.402 .4973 .2243 .6021
Modified gen_evaluate
due to #42. The results I got so far:
Recommendation
2022-06-07 20:04:28.364 | INFO | crslab.system.kgsf:train_recommender:147 - [Test]
2022-06-07 20:04:28.512 | INFO | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-06-07 20:04:30.113 | INFO | crslab.evaluator.standard:report:99 -
hit@1 hit@10 hit@50 info_loss mrr@1 mrr@10 mrr@50 ndcg@1 ndcg@10 ndcg@50 rec_loss
.03647 .1791 .3615 .6572 .03647 .07052 .07918 .03647 .09569 .1361 7.639
Conversation
2022-06-07 21:28:06.479 | INFO | crslab.system.kgsf:train_conversation:176 - [Test]
2022-06-07 21:28:06.488 | INFO | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-06-07 21:28:29.465 | INFO | crslab.evaluator.standard:report:99 -
average bleu@1 bleu@2 bleu@3 bleu@4 dist@1 dist@2 dist@3 dist@4 extreme f1 greedy
.7250 .1647 .03145 .01426 .009099 .3125 1.244 2.06 2.544 .5078 .2179 .5997
configs
{
"dataset": "ReDial",
"tokenize": "nltk",
"embedding": "word2vec.npy",
"context_truncate": 256,
"response_truncate": 30,
"scale": 1.0,
"model": "KGSF",
"token_emb_dim": 300,
"kg_emb_dim": 128,
"num_bases": 8,
"n_heads": 2,
"n_layers": 2,
"ffn_size": 300,
"dropout": 0.1,
"attention_dropout": 0.0,
"relu_dropout": 0.1,
"learn_positional_embeddings": false,
"embeddings_scale": true,
"reduction": false,
"n_positions": 1024,
"seed": 12345,
"pretrain": {
"epoch": 3,
"batch_size": 128,
"optimizer": {
"name": "Adam",
"lr": 0.001
}
},
"rec": {
"epoch": 30,
"batch_size": 128,
"optimizer": {
"name": "Adam",
"lr": 0.001
},
"early_stop": true,
"stop_mode": "min",
"impatience": 3
},
"conv": {
"epoch": 90,
"batch_size": 128,
"optimizer": {
"name": "Adam",
"lr": 0.001
},
"lr_scheduler": {
"name": "ReduceLROnPlateau",
"patience": 3,
"factor": 0.5
},
"gradient_clip": 0.1,
"early_stop": true,
"stop_mode": "min",
"impatience": 3
},
"gpu": [
0
],
"model_name": "KGSF"
}
@icedpanda Thank you very much for your efforts in reproducing the results. We will try our best to ensure that the results is consistent with the paper. However, due to the differences of data processing such as entity linking among different models, there are inevitably some differences in the model performance. We will continue to optimize the model in the future.