VRepair icon indicating copy to clipboard operation
VRepair copied to clipboard

Differences between BugFixTokenPairs files

Open dangnguyenngochai opened this issue 2 years ago • 2 comments

In the data folder, there are two types of datasets, one is named BugFixNoDup_ and BugFixTokenPairs_commits. Can you explain the difference between these two types of datasets? If I want to re-run the experiment for pre-training on BugFix data then which dataset should I be using.

dangnguyenngochai avatar May 03 '22 16:05 dangnguyenngochai

I believe that we used BugFixNoDup_ to generate the dataset. But if you are interested in using the dataset with context size 3, they are already in the data folder of each trained model.

chenzimin avatar May 04 '22 13:05 chenzimin

Do they have any different with the *_commit dataset ?

On Wed, May 4, 2022 at 8:03 PM chenzimin @.***> wrote:

I believe that we used BugFixNoDup_ to generate the dataset. But if you are interested in using the dataset with context size 3, they are already in the data folder of each trained model.

— Reply to this email directly, view it on GitHub https://github.com/SteveKommrusch/VRepair/issues/9#issuecomment-1117287828, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQJ6JV73CAJJODDX6SMDR2LVIJYTJANCNFSM5U7PVMGA . You are receiving this because you authored the thread.Message ID: @.***>

dangnguyenngochai avatar May 04 '22 14:05 dangnguyenngochai