OpenNMT-py
OpenNMT-py copied to clipboard
(Again, but different) AssertionError: assert model_dim % head_count == 0
Hello,
I'm a graduate student at Indiana University and am trying to run OpenNMT on one of our supercomputers. I keep getting the same error listed here: https://github.com/OpenNMT/OpenNMT-py/issues/952, but I already made the suggested changes. Any idea what the issue could be?
The one change I made was the switch to one GPU (and it runs on Google Colab just fine).
Beforehand, I got an error message saying something along the lines of this 'A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.' I do not remember what I did to correct this problem.
new_no.yaml
Data configurations
save_data: drive/MyDrive/MT_DATA/ src_vocab: drive/MyDrive/MT_DATA/vocab.src tgt_vocab: drive/MyDrive/MT_DATA/vocab.tgt save_model: drive/MyDrive/MT_DATA/ overwrite: True data: corpus_1: path_src: drive/MyDrive/MT_DATA/train_set_11_char.txt path_tgt: drive/MyDrive/MT_DATA/train_set_2_char.txt valid: path_src: drive/MyDrive/MT_DATA/dev_set_11_char.txt path_tgt: drive/MyDrive/MT_DATA/dev_set_2_char.txt
Training settings
save_checkpoint_steps: 10000 valid_steps: 10000 train_steps: 200000
Batching
bucket_size: 262144 world_size: 1 # Since only one GPU is available gpu_ranks: [0] # Adjusted for single GPU num_workers: 2 batch_type: "tokens" batch_size: 4096 valid_batch_size: 2048 accum_count: [4] accum_steps: [0]
Optimization
model_dtype: "fp16" optim: "adam" learning_rate: 2 warmup_steps: 8000 decay_method: "noam" adam_beta2: 0.998 max_grad_norm: 0 label_smoothing: 0.1 param_init: 0 param_init_glorot: true normalization: "tokens"
Model architecture
encoder_type: transformer decoder_type: transformer position_encoding: true enc_layers: 6 dec_layers: 6 heads: 8 hidden_size: 512 word_vec_size: 512 transformer_ff: 2048 dropout_steps: [0] dropout: [0.1] attention_dropout: [0.1]
If your model's dimension is evenly divisible by the head count, the assertion model_dim % head_count == 0 should not cause an error regardless of the computer you are using. Therefore, ensure to verify the paths and configurations you are referencing. If you are getting this error, since this is the only check, it means your config is wrong.
Hello Vincent,
Could you elaborate more? I don't quite understand what you are trying to say.
Thank you, Jim