COMET icon indicating copy to clipboard operation
COMET copied to clipboard

[QUESTION] About COMET Train and Test Data

Open moore3930 opened this issue 1 year ago • 0 comments

Hi, I have some questions about the dataset provided here: https://github.com/Unbabel/COMET/tree/master/data

  1. If I understand correctly, the training data for each year's WMT is accumulated from all previous WMTs. Does this mean that here (https://github.com/Unbabel/COMET/tree/master/data), DA data for 2021 is a subset of 2022?

  2. The training data here (https://github.com/Unbabel/COMET/blob/master/configs/models/referenceless_model.yaml) is set to data/1720-da.csv. Does this mean that just merge all DA data here (https://github.com/Unbabel/COMET/tree/master/data) from 2017 to 2020? Are there any duplication issues?

  3. Where can I find the test data if I want to formally test my metric on the WMT21 DA Task?

moore3930 avatar Jan 06 '25 00:01 moore3930