DiffCSE Discrepancies in DiffSCE Code Execution and Reported Results: Seeking Insight

Discrepancies in DiffSCE Code Execution and Reported Results: Seeking Insight

Open jasl1 opened this issue 2 years ago • 0 comments

I executed the source code of DiffSCE on my computational resource (Tesla V100-SXM2-32GB), using the identical configuration as specified in the run_diffcse.sh file. I obtained the following results, which differ from the results reported in your paper and on your GitHub repository. To illustrate, there is a 3.24-point difference (78.49 - 75.25 = 3.24) in average STS accuracy between your reported results and the results I obtained.

Do you have any insights or suggestions regarding the source of this disparity in performance when running the code to generate results? (@voidism)


[INFO|trainer.py:358] 2023-09-21 19:27:21,467 >> Using amp fp16 backend
09/21/2023 19:27:21 - INFO - __main__ -   *** Evaluate ***
tasks:  ['STSBenchmark', 'SICKRelatedness', 'STS12', 'STS13', 'STS14', 'STS15', 'STS16', 'MR', 'CR', 'SUBJ', 'MPQA', 'SST2', 'MRPC', 'TREC']
./SentEval/senteval/sts.py:42: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  sent1 = np.array([s.split() for s in sent1])[not_empty_idx]
./SentEval/senteval/sts.py:43: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  sent2 = np.array([s.split() for s in sent2])[not_empty_idx]
09/21/2023 19:27:54 - INFO - root -   Generating sentence embeddings
09/21/2023 19:28:02 - INFO - root -   Generated sentence embeddings
09/21/2023 19:28:02 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with (inner) 5-fold cross-validation
09/21/2023 19:28:10 - INFO - root -   Best param found at split 1: l2reg = 0.001                 with score 82.31
09/21/2023 19:28:20 - INFO - root -   Best param found at split 2: l2reg = 0.001                 with score 81.99
09/21/2023 19:28:32 - INFO - root -   Best param found at split 3: l2reg = 0.0001                 with score 82.27
09/21/2023 19:28:42 - INFO - root -   Best param found at split 4: l2reg = 0.01                 with score 81.54
09/21/2023 19:28:53 - INFO - root -   Best param found at split 5: l2reg = 0.0001                 with score 82.04
09/21/2023 19:28:54 - INFO - root -   Generating sentence embeddings
09/21/2023 19:28:56 - INFO - root -   Generated sentence embeddings
09/21/2023 19:28:56 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with (inner) 5-fold cross-validation
09/21/2023 19:28:59 - INFO - root -   Best param found at split 1: l2reg = 1e-05                 with score 87.81
09/21/2023 19:29:03 - INFO - root -   Best param found at split 2: l2reg = 0.0001                 with score 88.15
09/21/2023 19:29:07 - INFO - root -   Best param found at split 3: l2reg = 1e-05                 with score 87.32
09/21/2023 19:29:11 - INFO - root -   Best param found at split 4: l2reg = 1e-05                 with score 87.05
09/21/2023 19:29:15 - INFO - root -   Best param found at split 5: l2reg = 0.0001                 with score 87.25
09/21/2023 19:29:15 - INFO - root -   Generating sentence embeddings
09/21/2023 19:29:23 - INFO - root -   Generated sentence embeddings
09/21/2023 19:29:23 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with (inner) 5-fold cross-validation
09/21/2023 19:29:32 - INFO - root -   Best param found at split 1: l2reg = 0.001                 with score 95.22
09/21/2023 19:29:42 - INFO - root -   Best param found at split 2: l2reg = 1e-05                 with score 95.51
09/21/2023 19:29:52 - INFO - root -   Best param found at split 3: l2reg = 0.0001                 with score 95.31
09/21/2023 19:30:01 - INFO - root -   Best param found at split 4: l2reg = 0.001                 with score 95.45
09/21/2023 19:30:09 - INFO - root -   Best param found at split 5: l2reg = 0.0001                 with score 95.46
09/21/2023 19:30:10 - INFO - root -   Generating sentence embeddings
09/21/2023 19:30:12 - INFO - root -   Generated sentence embeddings
09/21/2023 19:30:12 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with (inner) 5-fold cross-validation
09/21/2023 19:30:21 - INFO - root -   Best param found at split 1: l2reg = 0.001                 with score 89.16
09/21/2023 19:30:29 - INFO - root -   Best param found at split 2: l2reg = 1e-05                 with score 88.19
09/21/2023 19:30:37 - INFO - root -   Best param found at split 3: l2reg = 0.001                 with score 88.91
09/21/2023 19:30:45 - INFO - root -   Best param found at split 4: l2reg = 0.001                 with score 88.44
09/21/2023 19:30:54 - INFO - root -   Best param found at split 5: l2reg = 0.001                 with score 88.93
09/21/2023 19:30:55 - INFO - root -   Computing embedding for train
09/21/2023 19:31:22 - INFO - root -   Computed train embeddings
09/21/2023 19:31:22 - INFO - root -   Computing embedding for dev
09/21/2023 19:31:23 - INFO - root -   Computed dev embeddings
09/21/2023 19:31:23 - INFO - root -   Computing embedding for test
09/21/2023 19:31:24 - INFO - root -   Computed test embeddings
09/21/2023 19:31:24 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with standard validation..
09/21/2023 19:31:36 - INFO - root -   [('reg:1e-05', 87.73), ('reg:0.0001', 87.84), ('reg:0.001', 87.61), ('reg:0.01', 86.93)]
09/21/2023 19:31:36 - INFO - root -   Validation : best param found is reg = 0.0001 with score             87.84
09/21/2023 19:31:36 - INFO - root -   Evaluating...
09/21/2023 19:31:39 - INFO - root -   ***** Transfer task : MRPC *****


09/21/2023 19:31:39 - INFO - root -   Computing embedding for train
09/21/2023 19:31:45 - INFO - root -   Computed train embeddings
09/21/2023 19:31:45 - INFO - root -   Computing embedding for test
09/21/2023 19:31:47 - INFO - root -   Computed test embeddings
09/21/2023 19:31:47 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with 5-fold cross-validation
09/21/2023 19:31:51 - INFO - root -   [('reg:1e-05', 74.85), ('reg:0.0001', 74.85), ('reg:0.001', 74.93), ('reg:0.01', 74.07)]
09/21/2023 19:31:51 - INFO - root -   Cross-validation : best param found is reg = 0.001             with score 74.93
09/21/2023 19:31:51 - INFO - root -   Evaluating...
09/21/2023 19:31:52 - INFO - root -   ***** Transfer task : TREC *****


09/21/2023 19:31:54 - INFO - root -   Computed train embeddings
09/21/2023 19:31:54 - INFO - root -   Computed test embeddings
09/21/2023 19:31:54 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with 5-fold cross-validation
09/21/2023 19:32:00 - INFO - root -   [('reg:1e-05', 84.15), ('reg:0.0001', 84.02), ('reg:0.001', 83.47), ('reg:0.01', 76.76)]
09/21/2023 19:32:00 - INFO - root -   Cross-validation : best param found is reg = 1e-05             with score 84.15
09/21/2023 19:32:00 - INFO - root -   Evaluating...
09/21/2023 19:32:00 - INFO - __main__ -   ***** Eval results *****
09/21/2023 19:32:00 - INFO - __main__ -     STS12 = 0.6466070114897755
09/21/2023 19:32:00 - INFO - __main__ -     STS13 = 0.7940081784855644
09/21/2023 19:32:00 - INFO - __main__ -     STS14 = 0.7106309581907064
09/21/2023 19:32:00 - INFO - __main__ -     STS15 = 0.8022190201969241
09/21/2023 19:32:00 - INFO - __main__ -     STS16 = 0.7800045550188356
09/21/2023 19:32:00 - INFO - __main__ -     eval_CR = 87.52
09/21/2023 19:32:00 - INFO - __main__ -     eval_MPQA = 88.73
09/21/2023 19:32:00 - INFO - __main__ -     eval_MR = 82.03
09/21/2023 19:32:00 - INFO - __main__ -     eval_MRPC = 74.93
09/21/2023 19:32:00 - INFO - __main__ -     eval_SST2 = 87.84
09/21/2023 19:32:00 - INFO - __main__ -     eval_SUBJ = 95.39
09/21/2023 19:32:00 - INFO - __main__ -     eval_TREC = 84.15
09/21/2023 19:32:00 - INFO - __main__ -     eval_avg_sts = 0.7525457395203998
09/21/2023 19:32:00 - INFO - __main__ -     eval_avg_transfer = 85.79857142857144
09/21/2023 19:32:00 - INFO - __main__ -     eval_sickr_spearman = 0.734116144071677
09/21/2023 19:32:00 - INFO - __main__ -     eval_stsb_spearman = 0.8002343091893147

Sep 21 '23 20:09 jasl1

DiffCSE DiffCSE copied to clipboard

Discrepancies in DiffSCE Code Execution and Reported Results: Seeking Insight

DiffCSE
DiffCSE copied to clipboard