CRSLab KGSF Performance on ReDial dastset

Hi, thanks for sharing such a great project.

I have run a benchmark for the ReDial dataset using KGSF. However, I got worse results compared to the original paper

This is the cmd I used and all configurations are set to default.

python run_crslab.py --config config/crs/kgsf/redial.yaml --gpu 0 --save_data --save_system --tensorboard --restore_data

I notice that the default parameters are different to the original paper:

batch_size: 32 -> 128
epochs for training recommendation module: 30 -> 9

Are there any suggested parameters to reproduce the results? I found that using a batch of 32 is extremely slow and I did a batch of 256 which leads to worse results.

Results Log with default setting

KG

2022-05-30 17:08:13.563 | INFO     | crslab.system.kgsf:pretrain:120 - [Pretrain epoch 2]
2022-05-30 17:08:13.578 | INFO     | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-05-30 17:09:51.380 | INFO     | crslab.evaluator.standard:report:98 - 
    grad norm  info_loss
        1.479      .4573

Recommendation

Results from the paper:

R@1: 0.039,

R@10: 0.183

R@50: 0.378

CRSLab:

2022-05-30 17:18:57.713 | INFO     | crslab.system.kgsf:train_recommender:147 - [Test]
2022-05-30 17:18:57.861 | INFO     | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-05-30 17:18:59.518 | INFO     | crslab.evaluator.standard:report:98 - 
    hit@1  hit@10  hit@50  info_loss  mrr@1  mrr@10  mrr@50  ndcg@1  ndcg@10  ndcg@50  rec_loss
   .03522   .1774   .3687      .7471 .03522  .07128  .08048  .03522   .09601    .1385     8.069

Conversation

Results from the paper:

Dist-2: 0.289,

Dist-3: 0.434

Dist-4: 0.519

CRSLab:

2022-05-30 18:44:43.492 | INFO     | crslab.system.kgsf:train_conversation:176 - [Test]
2022-05-30 18:44:43.500 | INFO     | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-05-30 18:45:06.845 | INFO     | crslab.evaluator.standard:report:98 - 
    average  bleu@1  bleu@2  bleu@3  bleu@4  dist@1  dist@2  dist@3  dist@4  extreme    f1  greedy
      .7300   .1671  .03262  .01538 .009669  .01072   .1129   .6729   1.855    .4991 .2173   .5993z

May 31 '22 05:05 icedpanda

I ran the parameters (batch size: 32 and epochs for recommender: 30) from the original paper but was still not able to get similar results.

Recommendation

2022-06-01 11:15:52.359 | INFO     | crslab.system.kgsf:train_recommender:147 - [Test]
2022-06-01 11:15:52.496 | INFO     | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-06-01 11:15:56.330 | INFO     | crslab.evaluator.standard:report:98 - 
    hit@1  hit@10  hit@50  info_loss  mrr@1  mrr@10  mrr@50  ndcg@1  ndcg@10  ndcg@50  rec_loss
   .03248   .1334   .3003      .5179 .03248   .0576  .06525  .03248    .0752    .1117      11.1

Conversation

2022-06-01 14:05:02.700 | INFO     | crslab.system.kgsf:train_conversation:176 - [Test]
2022-06-01 14:05:02.706 | INFO     | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-06-01 14:05:39.934 | INFO     | crslab.evaluator.standard:report:98 - 
    average  bleu@1  bleu@2  bleu@3  bleu@4  dist@1  dist@2  dist@3  dist@4  extreme    f1  greedy
      .7089   .1678  .03474  .01939  .01344  .01002   .0951   .5500   1.402    .4973 .2243   .6021

Jun 01 '22 06:06 icedpanda

Modified gen_evaluate due to #42. The results I got so far:

Recommendation

2022-06-07 20:04:28.364 | INFO     | crslab.system.kgsf:train_recommender:147 - [Test]
2022-06-07 20:04:28.512 | INFO     | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-06-07 20:04:30.113 | INFO     | crslab.evaluator.standard:report:99 - 
    hit@1  hit@10  hit@50  info_loss  mrr@1  mrr@10  mrr@50  ndcg@1  ndcg@10  ndcg@50  rec_loss
   .03647   .1791   .3615      .6572 .03647  .07052  .07918  .03647   .09569    .1361     7.639

Conversation

2022-06-07 21:28:06.479 | INFO     | crslab.system.kgsf:train_conversation:176 - [Test]
2022-06-07 21:28:06.488 | INFO     | crslab.data.dataloader.base:get_data:54 - [Finish dataset process before batchify]
2022-06-07 21:28:29.465 | INFO     | crslab.evaluator.standard:report:99 - 
    average  bleu@1  bleu@2  bleu@3  bleu@4  dist@1  dist@2  dist@3  dist@4  extreme    f1  greedy
      .7250   .1647  .03145  .01426 .009099   .3125   1.244    2.06   2.544    .5078 .2179   .5997

configs

{
    "dataset": "ReDial",
    "tokenize": "nltk",
    "embedding": "word2vec.npy",
    "context_truncate": 256,
    "response_truncate": 30,
    "scale": 1.0,
    "model": "KGSF",
    "token_emb_dim": 300,
    "kg_emb_dim": 128,
    "num_bases": 8,
    "n_heads": 2,
    "n_layers": 2,
    "ffn_size": 300,
    "dropout": 0.1,
    "attention_dropout": 0.0,
    "relu_dropout": 0.1,
    "learn_positional_embeddings": false,
    "embeddings_scale": true,
    "reduction": false,
    "n_positions": 1024,
    "seed": 12345,
    "pretrain": {
        "epoch": 3,
        "batch_size": 128,
        "optimizer": {
            "name": "Adam",
            "lr": 0.001
        }
    },
    "rec": {
        "epoch": 30,
        "batch_size": 128,
        "optimizer": {
            "name": "Adam",
            "lr": 0.001
        },
        "early_stop": true,
        "stop_mode": "min",
        "impatience": 3
    },
    "conv": {
        "epoch": 90,
        "batch_size": 128,
        "optimizer": {
            "name": "Adam",
            "lr": 0.001
        },
        "lr_scheduler": {
            "name": "ReduceLROnPlateau",
            "patience": 3,
            "factor": 0.5
        },
        "gradient_clip": 0.1,
        "early_stop": true,
        "stop_mode": "min",
        "impatience": 3
    },
    "gpu": [
        0
    ],
    "model_name": "KGSF"
}

Jun 13 '22 12:06 icedpanda

@icedpanda Thank you very much for your efforts in reproducing the results. We will try our best to ensure that the results is consistent with the paper. However, due to the differences of data processing such as entity linking among different models, there are inevitably some differences in the model performance. We will continue to optimize the model in the future.

Aug 22 '22 03:08 txy77

CRSLab CRSLab copied to clipboard

KGSF Performance on ReDial dastset

Results Log with default setting

CRSLab
CRSLab copied to clipboard