yanmtt
yanmtt copied to clipboard
Getting error when try to pre-train for three languages
using the below command:
python pretrain_nmt.py -n 1 -nr 0 -g 1 --use_official_pretrained --pretrained_model ai4bharat/IndicBART --tokenizer_name_or_path ai4bharat/IndicBART --langs hi,kn,bn --mono_src /home/aniruddha/all_data/train.hi,/home/aniruddha/all_data/train.kn,/home/aniruddha/all_data/train.bn --batch_size 8 --batch_size_indicates_lines --shard_files --model_path aibharat/IndicBART/model --port 7878
Using label smoothing of 0.1
Using gradient clipping norm of 1.0
Using softmax temperature of 1.0
Masking ratio: 0.3
Training for: ['hi', 'kn', 'bn']
Shuffling corpus!
Shuffling corpus!
Shuffling corpus!
Saving the model
Loading from checkpoint
Traceback (most recent call last):
File "pretrain_nmt.py", line 968, in
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in wrap fn(i, *args) File "/home/aniruddha/yanmtt/pretrain_nmt.py", line 521, in model_create_load_run_save lprobs, labels, args.label_smoothing, ignore_index=tok.pad_token_id File "/home/aniruddha/yanmtt/common_utils.py", line 147, in label_smoothed_nll_loss smooth_loss.masked_fill(pad_mask, 0.0) RuntimeError: The expanded size of the tensor (316) must match the existing size (315) at non-singleton dimension 1. Target sizes: [8, 316, 1]. Tensor sizes: [8, 315, 1]
Have you converted the scripts for non devanagari languages to devanagari?
Look here: https://github.com/AI4Bharat/indic-bart
That's likely the reason.
Yes, we converted it.
On Mon, 29 Aug, 2022, 9:19 pm Raj Dabre, @.***> wrote:
Have you converted the scripts for non devanagari languages to devanagari?
Look here: https://github.com/AI4Bharat/indic-bart
That's likely the reason.
— Reply to this email directly, view it on GitHub https://github.com/prajdabre/yanmtt/issues/37#issuecomment-1230500105, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIWJFZSZKTAB45YPTYYLWG3V3TLYJANCNFSM576B32QQ . You are receiving this because you authored the thread.Message ID: @.***>
Can you give me the detailed log?
Using label smoothing of 0.1
Using gradient clipping norm of 1.0
Using softmax temperature of 1.0
Masking ratio: 0.3
Training for: ['hi', 'kn', 'bn']
Shuffling corpus!
Shuffling corpus!
Shuffling corpus!
Saving the model
Loading from checkpoint
Traceback (most recent call last):
File "pretrain_nmt.py", line 968, in
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in wrap fn(i, *args) File "/home/aniruddha/yanmtt/pretrain_nmt.py", line 521, in model_create_load_run_save lprobs, labels, args.label_smoothing, ignore_index=tok.pad_token_id File "/home/aniruddha/yanmtt/common_utils.py", line 147, in label_smoothed_nll_loss smooth_loss.masked_fill(pad_mask, 0.0) RuntimeError: The expanded size of the tensor (383) must match the existing size (382) at non-singleton dimension 1. Target sizes: [8, 383, 1]. Tensor sizes: [8, 382, 1]
I mean the log right from the moment you ran the model. I need to see the tokenizer loading message etc.
IP address is localhost
Monolingual training files are: {'hi': '/home/aniruddha/all_data/train.hi', 'kn': '/home/aniruddha/all_data/train.kn', 'bn': '/home/aniruddha/all_data/train.bn'}
Sharding files into 1 parts
For language: hi the total number of lines are: 159354 and number of lines per shard are: 159354
File for language hi has been sharded.
For language: kn the total number of lines are: 56715 and number of lines per shard are: 56715
File for language kn has been sharded.
For language: bn the total number of lines are: 438796 and number of lines per shard are: 438796
File for language bn has been sharded.
Sharding files into 1 parts
Tokenizer is: PreTrainedTokenizer(name_or_path='ai4bharat/IndicBART', vocab_size=64000, model_max_len=1000000000000000019884624838656, is_fast=False, padding_side='right', special_tokens={'bos_token': '[CLS]', 'eos_token': '[SEP]', 'unk_token': '', '', '<2as>', '<2bn>', '<2en>', '<2gu>', '<2hi>', '<2kn>', '<2ml>', '<2mr>', '<2or>', '<2pa>', '<2ta>', '<2te>']})
Running DDP checkpoint example on rank 0.
We will do fp32 training
2022-08-29 21:44:47.535611: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2022-08-29 21:44:47.535653: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Using positional embeddings
Using positional embeddings
Memory consumed after moving model to GPU 0.91 GB
Memory consumed after wrapping model in DDP 2.06 GB
Optimizing ['module.model.shared.weight', 'module.model.encoder.embed_positions.weight', 'module.model.encoder.layers.0.self_attn.k_proj.weight', 'module.model.encoder.layers.0.self_attn.k_proj.bias', 'module.model.encoder.layers.0.self_attn.v_proj.weight', 'module.model.encoder.layers.0.self_attn.v_proj.bias', 'module.model.encoder.layers.0.self_attn.q_proj.weight', 'module.model.encoder.layers.0.self_attn.q_proj.bias', 'module.model.encoder.layers.0.self_attn.out_proj.weight', 'module.model.encoder.layers.0.self_attn.out_proj.bias', 'module.model.encoder.layers.0.self_attn_layer_norm.weight', 'module.model.encoder.layers.0.self_attn_layer_norm.bias', 'module.model.encoder.layers.0.fc1.weight', 'module.model.encoder.layers.0.fc1.bias', 'module.model.encoder.layers.0.fc2.weight', 'module.model.encoder.layers.0.fc2.bias', 'module.model.encoder.layers.0.final_layer_norm.weight', 'module.model.encoder.layers.0.final_layer_norm.bias', 'module.model.encoder.layers.1.self_attn.k_proj.weight', 'module.model.encoder.layers.1.self_attn.k_proj.bias', 'module.model.encoder.layers.1.self_attn.v_proj.weight', 'module.model.encoder.layers.1.self_attn.v_proj.bias', 'module.model.encoder.layers.1.self_attn.q_proj.weight', 'module.model.encoder.layers.1.self_attn.q_proj.bias', 'module.model.encoder.layers.1.self_attn.out_proj.weight', 'module.model.encoder.layers.1.self_attn.out_proj.bias', 'module.model.encoder.layers.1.self_attn_layer_norm.weight', 'module.model.encoder.layers.1.self_attn_layer_norm.bias', 'module.model.encoder.layers.1.fc1.weight', 'module.model.encoder.layers.1.fc1.bias', 'module.model.encoder.layers.1.fc2.weight', 'module.model.encoder.layers.1.fc2.bias', 'module.model.encoder.layers.1.final_layer_norm.weight', 'module.model.encoder.layers.1.final_layer_norm.bias', 'module.model.encoder.layers.2.self_attn.k_proj.weight', 'module.model.encoder.layers.2.self_attn.k_proj.bias', 'module.model.encoder.layers.2.self_attn.v_proj.weight', 'module.model.encoder.layers.2.self_attn.v_proj.bias', 'module.model.encoder.layers.2.self_attn.q_proj.weight', 'module.model.encoder.layers.2.self_attn.q_proj.bias', 'module.model.encoder.layers.2.self_attn.out_proj.weight', 'module.model.encoder.layers.2.self_attn.out_proj.bias', 'module.model.encoder.layers.2.self_attn_layer_norm.weight', 'module.model.encoder.layers.2.self_attn_layer_norm.bias', 'module.model.encoder.layers.2.fc1.weight', 'module.model.encoder.layers.2.fc1.bias', 'module.model.encoder.layers.2.fc2.weight', 'module.model.encoder.layers.2.fc2.bias', 'module.model.encoder.layers.2.final_layer_norm.weight', 'module.model.encoder.layers.2.final_layer_norm.bias', 'module.model.encoder.layers.3.self_attn.k_proj.weight', 'module.model.encoder.layers.3.self_attn.k_proj.bias', 'module.model.encoder.layers.3.self_attn.v_proj.weight', 'module.model.encoder.layers.3.self_attn.v_proj.bias', 'module.model.encoder.layers.3.self_attn.q_proj.weight', 'module.model.encoder.layers.3.self_attn.q_proj.bias', 'module.model.encoder.layers.3.self_attn.out_proj.weight', 'module.model.encoder.layers.3.self_attn.out_proj.bias', 'module.model.encoder.layers.3.self_attn_layer_norm.weight', 'module.model.encoder.layers.3.self_attn_layer_norm.bias', 'module.model.encoder.layers.3.fc1.weight', 'module.model.encoder.layers.3.fc1.bias', 'module.model.encoder.layers.3.fc2.weight', 'module.model.encoder.layers.3.fc2.bias', 'module.model.encoder.layers.3.final_layer_norm.weight', 'module.model.encoder.layers.3.final_layer_norm.bias', 'module.model.encoder.layers.4.self_attn.k_proj.weight', 'module.model.encoder.layers.4.self_attn.k_proj.bias', 'module.model.encoder.layers.4.self_attn.v_proj.weight', 'module.model.encoder.layers.4.self_attn.v_proj.bias', 'module.model.encoder.layers.4.self_attn.q_proj.weight', 'module.model.encoder.layers.4.self_attn.q_proj.bias', 'module.model.encoder.layers.4.self_attn.out_proj.weight', 'module.model.encoder.layers.4.self_attn.out_proj.bias', 'module.model.encoder.layers.4.self_attn_layer_norm.weight', 'module.model.encoder.layers.4.self_attn_layer_norm.bias', 'module.model.encoder.layers.4.fc1.weight', 'module.model.encoder.layers.4.fc1.bias', 'module.model.encoder.layers.4.fc2.weight', 'module.model.encoder.layers.4.fc2.bias', 'module.model.encoder.layers.4.final_layer_norm.weight', 'module.model.encoder.layers.4.final_layer_norm.bias', 'module.model.encoder.layers.5.self_attn.k_proj.weight', 'module.model.encoder.layers.5.self_attn.k_proj.bias', 'module.model.encoder.layers.5.self_attn.v_proj.weight', 'module.model.encoder.layers.5.self_attn.v_proj.bias', 'module.model.encoder.layers.5.self_attn.q_proj.weight', 'module.model.encoder.layers.5.self_attn.q_proj.bias', 'module.model.encoder.layers.5.self_attn.out_proj.weight', 'module.model.encoder.layers.5.self_attn.out_proj.bias', 'module.model.encoder.layers.5.self_attn_layer_norm.weight', 'module.model.encoder.layers.5.self_attn_layer_norm.bias', 'module.model.encoder.layers.5.fc1.weight', 'module.model.encoder.layers.5.fc1.bias', 'module.model.encoder.layers.5.fc2.weight', 'module.model.encoder.layers.5.fc2.bias', 'module.model.encoder.layers.5.final_layer_norm.weight', 'module.model.encoder.layers.5.final_layer_norm.bias', 'module.model.encoder.layernorm_embedding.weight', 'module.model.encoder.layernorm_embedding.bias', 'module.model.encoder.layer_norm.weight', 'module.model.encoder.layer_norm.bias', 'module.model.decoder.embed_positions.weight', 'module.model.decoder.layers.0.self_attn.k_proj.weight', 'module.model.decoder.layers.0.self_attn.k_proj.bias', 'module.model.decoder.layers.0.self_attn.v_proj.weight', 'module.model.decoder.layers.0.self_attn.v_proj.bias', 'module.model.decoder.layers.0.self_attn.q_proj.weight', 'module.model.decoder.layers.0.self_attn.q_proj.bias', 'module.model.decoder.layers.0.self_attn.out_proj.weight', 'module.model.decoder.layers.0.self_attn.out_proj.bias', 'module.model.decoder.layers.0.self_attn_layer_norm.weight', 'module.model.decoder.layers.0.self_attn_layer_norm.bias', 'module.model.decoder.layers.0.encoder_attn.k_proj.weight', 'module.model.decoder.layers.0.encoder_attn.k_proj.bias', 'module.model.decoder.layers.0.encoder_attn.v_proj.weight', 'module.model.decoder.layers.0.encoder_attn.v_proj.bias', 'module.model.decoder.layers.0.encoder_attn.q_proj.weight', 'module.model.decoder.layers.0.encoder_attn.q_proj.bias', 'module.model.decoder.layers.0.encoder_attn.out_proj.weight', 'module.model.decoder.layers.0.encoder_attn.out_proj.bias', 'module.model.decoder.layers.0.encoder_attn_layer_norm.weight', 'module.model.decoder.layers.0.encoder_attn_layer_norm.bias', 'module.model.decoder.layers.0.fc1.weight', 'module.model.decoder.layers.0.fc1.bias', 'module.model.decoder.layers.0.fc2.weight', 'module.model.decoder.layers.0.fc2.bias', 'module.model.decoder.layers.0.final_layer_norm.weight', 'module.model.decoder.layers.0.final_layer_norm.bias', 'module.model.decoder.layers.1.self_attn.k_proj.weight', 'module.model.decoder.layers.1.self_attn.k_proj.bias', 'module.model.decoder.layers.1.self_attn.v_proj.weight', 'module.model.decoder.layers.1.self_attn.v_proj.bias', 'module.model.decoder.layers.1.self_attn.q_proj.weight', 'module.model.decoder.layers.1.self_attn.q_proj.bias', 'module.model.decoder.layers.1.self_attn.out_proj.weight', 'module.model.decoder.layers.1.self_attn.out_proj.bias', 'module.model.decoder.layers.1.self_attn_layer_norm.weight', 'module.model.decoder.layers.1.self_attn_layer_norm.bias', 'module.model.decoder.layers.1.encoder_attn.k_proj.weight', 'module.model.decoder.layers.1.encoder_attn.k_proj.bias', 'module.model.decoder.layers.1.encoder_attn.v_proj.weight', 'module.model.decoder.layers.1.encoder_attn.v_proj.bias', 'module.model.decoder.layers.1.encoder_attn.q_proj.weight', 'module.model.decoder.layers.1.encoder_attn.q_proj.bias', 'module.model.decoder.layers.1.encoder_attn.out_proj.weight', 'module.model.decoder.layers.1.encoder_attn.out_proj.bias', 'module.model.decoder.layers.1.encoder_attn_layer_norm.weight', 'module.model.decoder.layers.1.encoder_attn_layer_norm.bias', 'module.model.decoder.layers.1.fc1.weight', 'module.model.decoder.layers.1.fc1.bias', 'module.model.decoder.layers.1.fc2.weight', 'module.model.decoder.layers.1.fc2.bias', 'module.model.decoder.layers.1.final_layer_norm.weight', 'module.model.decoder.layers.1.final_layer_norm.bias', 'module.model.decoder.layers.2.self_attn.k_proj.weight', 'module.model.decoder.layers.2.self_attn.k_proj.bias', 'module.model.decoder.layers.2.self_attn.v_proj.weight', 'module.model.decoder.layers.2.self_attn.v_proj.bias', 'module.model.decoder.layers.2.self_attn.q_proj.weight', 'module.model.decoder.layers.2.self_attn.q_proj.bias', 'module.model.decoder.layers.2.self_attn.out_proj.weight', 'module.model.decoder.layers.2.self_attn.out_proj.bias', 'module.model.decoder.layers.2.self_attn_layer_norm.weight', 'module.model.decoder.layers.2.self_attn_layer_norm.bias', 'module.model.decoder.layers.2.encoder_attn.k_proj.weight', 'module.model.decoder.layers.2.encoder_attn.k_proj.bias', 'module.model.decoder.layers.2.encoder_attn.v_proj.weight', 'module.model.decoder.layers.2.encoder_attn.v_proj.bias', 'module.model.decoder.layers.2.encoder_attn.q_proj.weight', 'module.model.decoder.layers.2.encoder_attn.q_proj.bias', 'module.model.decoder.layers.2.encoder_attn.out_proj.weight', 'module.model.decoder.layers.2.encoder_attn.out_proj.bias', 'module.model.decoder.layers.2.encoder_attn_layer_norm.weight', 'module.model.decoder.layers.2.encoder_attn_layer_norm.bias', 'module.model.decoder.layers.2.fc1.weight', 'module.model.decoder.layers.2.fc1.bias', 'module.model.decoder.layers.2.fc2.weight', 'module.model.decoder.layers.2.fc2.bias', 'module.model.decoder.layers.2.final_layer_norm.weight', 'module.model.decoder.layers.2.final_layer_norm.bias', 'module.model.decoder.layers.3.self_attn.k_proj.weight', 'module.model.decoder.layers.3.self_attn.k_proj.bias', 'module.model.decoder.layers.3.self_attn.v_proj.weight', 'module.model.decoder.layers.3.self_attn.v_proj.bias', 'module.model.decoder.layers.3.self_attn.q_proj.weight', 'module.model.decoder.layers.3.self_attn.q_proj.bias', 'module.model.decoder.layers.3.self_attn.out_proj.weight', 'module.model.decoder.layers.3.self_attn.out_proj.bias', 'module.model.decoder.layers.3.self_attn_layer_norm.weight', 'module.model.decoder.layers.3.self_attn_layer_norm.bias', 'module.model.decoder.layers.3.encoder_attn.k_proj.weight', 'module.model.decoder.layers.3.encoder_attn.k_proj.bias', 'module.model.decoder.layers.3.encoder_attn.v_proj.weight', 'module.model.decoder.layers.3.encoder_attn.v_proj.bias', 'module.model.decoder.layers.3.encoder_attn.q_proj.weight', 'module.model.decoder.layers.3.encoder_attn.q_proj.bias', 'module.model.decoder.layers.3.encoder_attn.out_proj.weight', 'module.model.decoder.layers.3.encoder_attn.out_proj.bias', 'module.model.decoder.layers.3.encoder_attn_layer_norm.weight', 'module.model.decoder.layers.3.encoder_attn_layer_norm.bias', 'module.model.decoder.layers.3.fc1.weight', 'module.model.decoder.layers.3.fc1.bias', 'module.model.decoder.layers.3.fc2.weight', 'module.model.decoder.layers.3.fc2.bias', 'module.model.decoder.layers.3.final_layer_norm.weight', 'module.model.decoder.layers.3.final_layer_norm.bias', 'module.model.decoder.layers.4.self_attn.k_proj.weight', 'module.model.decoder.layers.4.self_attn.k_proj.bias', 'module.model.decoder.layers.4.self_attn.v_proj.weight', 'module.model.decoder.layers.4.self_attn.v_proj.bias', 'module.model.decoder.layers.4.self_attn.q_proj.weight', 'module.model.decoder.layers.4.self_attn.q_proj.bias', 'module.model.decoder.layers.4.self_attn.out_proj.weight', 'module.model.decoder.layers.4.self_attn.out_proj.bias', 'module.model.decoder.layers.4.self_attn_layer_norm.weight', 'module.model.decoder.layers.4.self_attn_layer_norm.bias', 'module.model.decoder.layers.4.encoder_attn.k_proj.weight', 'module.model.decoder.layers.4.encoder_attn.k_proj.bias', 'module.model.decoder.layers.4.encoder_attn.v_proj.weight', 'module.model.decoder.layers.4.encoder_attn.v_proj.bias', 'module.model.decoder.layers.4.encoder_attn.q_proj.weight', 'module.model.decoder.layers.4.encoder_attn.q_proj.bias', 'module.model.decoder.layers.4.encoder_attn.out_proj.weight', 'module.model.decoder.layers.4.encoder_attn.out_proj.bias', 'module.model.decoder.layers.4.encoder_attn_layer_norm.weight', 'module.model.decoder.layers.4.encoder_attn_layer_norm.bias', 'module.model.decoder.layers.4.fc1.weight', 'module.model.decoder.layers.4.fc1.bias', 'module.model.decoder.layers.4.fc2.weight', 'module.model.decoder.layers.4.fc2.bias', 'module.model.decoder.layers.4.final_layer_norm.weight', 'module.model.decoder.layers.4.final_layer_norm.bias', 'module.model.decoder.layers.5.self_attn.k_proj.weight', 'module.model.decoder.layers.5.self_attn.k_proj.bias', 'module.model.decoder.layers.5.self_attn.v_proj.weight', 'module.model.decoder.layers.5.self_attn.v_proj.bias', 'module.model.decoder.layers.5.self_attn.q_proj.weight', 'module.model.decoder.layers.5.self_attn.q_proj.bias', 'module.model.decoder.layers.5.self_attn.out_proj.weight', 'module.model.decoder.layers.5.self_attn.out_proj.bias', 'module.model.decoder.layers.5.self_attn_layer_norm.weight', 'module.model.decoder.layers.5.self_attn_layer_norm.bias', 'module.model.decoder.layers.5.encoder_attn.k_proj.weight', 'module.model.decoder.layers.5.encoder_attn.k_proj.bias', 'module.model.decoder.layers.5.encoder_attn.v_proj.weight', 'module.model.decoder.layers.5.encoder_attn.v_proj.bias', 'module.model.decoder.layers.5.encoder_attn.q_proj.weight', 'module.model.decoder.layers.5.encoder_attn.q_proj.bias', 'module.model.decoder.layers.5.encoder_attn.out_proj.weight', 'module.model.decoder.layers.5.encoder_attn.out_proj.bias', 'module.model.decoder.layers.5.encoder_attn_layer_norm.weight', 'module.model.decoder.layers.5.encoder_attn_layer_norm.bias', 'module.model.decoder.layers.5.fc1.weight', 'module.model.decoder.layers.5.fc1.bias', 'module.model.decoder.layers.5.fc2.weight', 'module.model.decoder.layers.5.fc2.bias', 'module.model.decoder.layers.5.final_layer_norm.weight', 'module.model.decoder.layers.5.final_layer_norm.bias', 'module.model.decoder.layernorm_embedding.weight', 'module.model.decoder.layernorm_embedding.bias', 'module.model.decoder.layer_norm.weight', 'module.model.decoder.layer_norm.bias']
Number of model parameters: 244017152
Total number of params to be optimized are: 244017152
Percentage of parameters to be optimized: 100.0
/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr()
.
warnings.warn("To get the last learning rate computed by the scheduler, "
/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step()
before optimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step()
before lr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Initial LR is: 1.25e-07
Training from official pretrained model
/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:216: UserWarning: Please also save or load the state of the optimizer when saving or loading the scheduler.
warnings.warn(SAVE_STATE_WARNING, UserWarning)
/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:234: UserWarning: Please also save or load the state of the optimizer when saving or loading the scheduler.
warnings.warn(SAVE_STATE_WARNING, UserWarning)
Using label smoothing of 0.1
Using gradient clipping norm of 1.0
Using softmax temperature of 1.0
Masking ratio: 0.3
Training for: ['hi', 'kn', 'bn']
Shuffling corpus!
Shuffling corpus!
Shuffling corpus!
Saving the model
Loading from checkpoint
Traceback (most recent call last):
File "pretrain_nmt.py", line 968, in
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/aniruddha/anaconda3/envs/pretam/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in wrap fn(i, *args) File "/home/aniruddha/yanmtt/pretrain_nmt.py", line 521, in model_create
Your problem is likely here
PreTrainedTokenizer(name_or_path='ai4bharat/IndicBART', vocab_size=64000, model_max_len=1000000000000000019884624838656, is_fast=False, padding_side='right', special_tokens={'bos_token': '[CLS]', 'eos_token': '[SEP]', 'unk_token': '', 'sep_token': '[SEP]', 'pad_token': '', 'cls_token': '[CLS]', 'mask_token': AddedToken("[MASK]", rstrip=False, lstrip=True, single_word=False, normalized=True), 'additional_special_tokens': ['', '', '<2as>', '<2bn>', '<2en>', '<2gu>', '<2hi>', '<2kn>', '<2ml>', '<2mr>', '<2or>', '<2pa>', '<2ta>', '<2te>']}) Running DDP checkpoint example on rank 0. We will do fp32 training 2022-08-29 21:44:47.535611: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory 2022-08-29 21:44:47.535653: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not
Firstly, I am not sure why the tokens and are missing from the tokenizer. Right now they seem to be empty. I am 99% sure that the code changes I made last week were correct. I will check again just to be sure but even without that there seems to be some issue with your cuda installation.
Actually, we are using dgx A100 server, and the cuda is installed by Nvidia itself.
On Mon, 29 Aug, 2022, 10:04 pm Raj Dabre, @.***> wrote:
Your problem is likely here
PreTrainedTokenizer(name_or_path='ai4bharat/IndicBART', vocab_size=64000, model_max_len=1000000000000000019884624838656, is_fast=False, padding_side='right', special_tokens={'bos_token': '[CLS]', 'eos_token': '[SEP]', 'unk_token': '', 'sep_token': '[SEP]', 'pad_token': '', 'cls_token': '[CLS]', 'mask_token': AddedToken("[MASK]", rstrip=False, lstrip=True, single_word=False, normalized=True), 'additional_special_tokens': ['', '', '<2as>', '<2bn>', '<2en>', '<2gu>', '<2hi>', '<2kn>', '<2ml>', '<2mr>', '<2or>', '<2pa>', '<2ta>', '<2te>']}) Running DDP checkpoint example on rank 0. We will do fp32 training 2022-08-29 21:44:47.535611: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory 2022-08-29 21:44:47.535653: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not
Firstly, I am not sure why the tokens and are missing from the tokenizer. Right now they seem to be empty. I am 99% sure that the code changes I made last week were correct. I will check again just to be sure but even without that there seems to be some issue with your cuda installation.
— Reply to this email directly, view it on GitHub https://github.com/prajdabre/yanmtt/issues/37#issuecomment-1230553328, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIWJFZXZ65ILCDGSEP4MUMDV3TRCJANCNFSM576B32QQ . You are receiving this because you authored the thread.Message ID: @.***>
Oh wait the tokens are not missing. The confusion was because you copypasted the log without using the ``.
For example: and will be displayed weirdly with a dash over the word and.
But: <s>
and </s>
is displayed correctly.
In the future please post logs with ``
As for dgx, I dont think that is the problem. However, I have never worked with dgx so I cant be sure. Since I dont have a dgx I cant debug dgx issues.
Nevertheless, the fact is that the following error needs to be solved: 08-29 21:44:47.535611: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory 2022-08-29 21:44:47.535653: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not
Unless that happens I cant think of any other solution. I recommend googling "Could not load dynamic library 'libcudart.so.10.1';" and solving that issue first.
but when I am passing one language, then it is working
The problem can be your data as well.
Can you try running with the individual languages to identify the problematic one?
कोई भी एजेंसी-होल्डर अधिक पैसे लेने के उद्देश्य से पाठ्य पुस्तकों पर जिल्दबन्दी नहीं कर सकता । पंजाब स्कूल शिक्षा बोर्ड द्वारा मुद्रित तथा प्रकाशित पाठ्य पुस्तकों कोतकों) की छपाई, प्रकाशन, स्टॉक करना, जमाखोरी या बिक्री आदि करना भारतीय दंड प्रणाली के अन्तर्गत गैरकानूनी जुर्म है । सचिव, पंजाब स्कूल शिक्षा बोर्ड, विद्यागर-160062 द्वारा प्रकाशित तथा मैस पंजाब किताब घर, जालन्धर द्वारा मुद्रित ।
above is the sample for hindi
I run with another server..the same problems happens..now cuda problem is not showing
Hi,
I figured out the issue.
The problem is that when official models are used, you need to pass the language indicator tokens directly to the script.
So before you passed --langs hi,kn,bn now you need to pass --langs "<2hi>,<2kn>,<2bn>"
Try it and let me know.