COMET
COMET copied to clipboard
[QUESTION] About COMET Train and Test Data
Hi, I have some questions about the dataset provided here: https://github.com/Unbabel/COMET/tree/master/data
-
If I understand correctly, the training data for each year's WMT is accumulated from all previous WMTs. Does this mean that here (https://github.com/Unbabel/COMET/tree/master/data), DA data for 2021 is a subset of 2022?
-
The training data here (https://github.com/Unbabel/COMET/blob/master/configs/models/referenceless_model.yaml) is set to
data/1720-da.csv. Does this mean that just merge all DA data here (https://github.com/Unbabel/COMET/tree/master/data) from 2017 to 2020? Are there any duplication issues? -
Where can I find the test data if I want to formally test my metric on the WMT21 DA Task?