VRepair icon indicating copy to clipboard operation
VRepair copied to clipboard

Replicating the target domain training results

Open dangnguyenngochai opened this issue 2 years ago • 8 comments

Hello @chenzimin, Can you help me take a look at this log? Somehow my results showed better results than yours using a smaller architecture with half the size? Here is my log: log file

dangnguyenngochai avatar May 06 '22 16:05 dangnguyenngochai

Hi @dangnguyenngochai,

I could not open the log file.

chenzimin avatar May 09 '22 11:05 chenzimin

Hello @chenzimin ,

I reupload the log file to Pastebin, please help me take a look: link

dangnguyenngochai avatar May 13 '22 08:05 dangnguyenngochai

I can see the log with the new link. The training accuracy is really high (almost 100%), but the validation accuracy is only [2022-05-06 01:14:23,334 INFO] Validation accuracy: 49.948. So it seems like it is overfitting to the training data.

chenzimin avatar May 13 '22 12:05 chenzimin

This log is the training log for target domain only, isn't it too good to be true comparing to the results you reported in your paper ?

On Fri, May 13, 2022, 7:33 PM chenzimin @.***> wrote:

I can see the log with the new link. The training accuracy is really high (almost 100%), but the validation accuracy is only [2022-05-06 01:14:23,334 INFO] Validation accuracy: 49.948. So it seems like it is overfitting to the training data.

— Reply to this email directly, view it on GitHub https://github.com/SteveKommrusch/VRepair/issues/10#issuecomment-1126008179, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQJ6JV4HOBGX2NXOHBNW6WDVJZDZXANCNFSM5VIWLRBA . You are receiving this because you modified the open/close state.Message ID: @.***>

dangnguyenngochai avatar May 13 '22 13:05 dangnguyenngochai

I believe that OpenNMT-py reports per token accuracy, whereas we calculate the sequence accuracy.

chenzimin avatar May 14 '22 07:05 chenzimin

Do you have the code for calculating sequence level accuracy cause I can not seem to find it in the repo or do I have to reimplement the code myself?

On Sat, May 14, 2022 at 2:44 PM chenzimin @.***> wrote:

I believe that OpenNMT-py reports per token accuracy, whereas we calculate the sequence accuracy.

— Reply to this email directly, view it on GitHub https://github.com/SteveKommrusch/VRepair/issues/10#issuecomment-1126664642, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQJ6JV6IMPRVGV4ZM5TA6T3VJ5KU7ANCNFSM5VIWLRBA . You are receiving this because you modified the open/close state.Message ID: @.***>

dangnguyenngochai avatar May 14 '22 07:05 dangnguyenngochai

We use the src/compare.py file to compare the output of our model when given test samples with the expected sequence for the sample. For example: "python ../../src/compare.py --src=xlate.txt --tgt=tgt-test.txt -v > pass.txt".

The usage for compare.py is: Usage: python compare.py --src [src_file] --tgt [tgt_file] [-v] the -v option will print out all passing cases Example: python compare.py --src pred-test.txt --tgt tgt-test.txt

SteveKommrusch avatar May 15 '22 01:05 SteveKommrusch

One more thing, you can use src/find_best_model_and_translate_config.py to generate command to generate the predictions from the model with the best validation accuracy.

chenzimin avatar May 17 '22 13:05 chenzimin