COMET icon indicating copy to clipboard operation
COMET copied to clipboard

Training data and scripts used for wmt22-cometkiwi-da

Open rohitk-cognizant opened this issue 1 year ago • 4 comments

Hi Team,

Can you share the training data and training scripts used for wmt22-cometkiwi-da. We want it reference for training with our own sample reference data.

rohitk-cognizant avatar May 02 '24 17:05 rohitk-cognizant

Hi @rohitk-cognizant,

To train wmt22-cometkiwi-da you just have to run:

comet-train --cfg configs/models/{your_model_config}.yaml

Your configs should be something like this:

unified_metric:
  class_path: comet.models.UnifiedMetric
  init_args:
    nr_frozen_epochs: 0.3
    keep_embeddings_frozen: True
    optimizer: AdamW
    encoder_learning_rate: 1.0e-06
    learning_rate: 1.5e-05
    layerwise_decay: 0.95
    encoder_model: XLM-RoBERTa
    pretrained_model: microsoft/infoxlm-large
    sent_layer: mix
    layer_transformation: sparsemax
    word_layer: 24
    loss: mse
    dropout: 0.1
    batch_size: 16
    train_data: 
      - TRAIN_DATA.csv
    validation_data: 
      - VALIDATION_DATA.csv
    hidden_sizes:
      - 3072
      - 1024
    activations: Tanh
    input_segments:
      - mt
      - src
    word_level_training: False
    
trainer: ../trainer.yaml
early_stopping: ../early_stopping.yaml
model_checkpoint: ../model_checkpoint.yaml

ricardorei avatar May 04 '24 13:05 ricardorei

Hi @ricardorei ,

Thanks for the update. Can I use the same training parameters mentioned in master branch trainer.yaml file?

rohitk-cognizant avatar May 05 '24 12:05 rohitk-cognizant

Hmm maybe you should change them a bit. For example to train on a single GPU (which is usually faster) and with precision 16 use this:

  accelerator: gpu
  devices: 1
  # strategy: ddp # Comment this line for distributed training
  precision: 16

You might also want to consider reducing the accumulate_grad_batches to 2 instead of 8

  accumulate_grad_batches: 2

ricardorei avatar May 05 '24 13:05 ricardorei

What is the format that the data should look like?

satya77 avatar Aug 08 '24 19:08 satya77