tokenizers tokenizer.save

I'm trying to save and load a fine-tuned model that i trained following as per the site https://huggingface.co/transformers/v1.0.0/model_doc/overview.html#loading-google-ai-or-openai-pre-trained-weights-or-pytorch-dump . But facing this error below. Version: Transformer: 4.20.1

Traceback (most recent call last): File "/home/kshankar/Desktop/Project/Zero_Shot_updated/Fine-tuning/snli_bert_base_uncased3.py", line 352, in Acc,output_config_file, output_vocab_file, output_model_file = train(model=model, epochs=EPOCHS, train_data_loader=train_dataloader, val_data_loader=val_dataloader, File "/home/kshankar/Desktop/Project/Zero_Shot_updated/Fine-tuning/snli_bert_base_uncased3.py", line 313, in train tokenizer.save_vocabulary(output_vocab_file) File "/home/kshankar/miniconda3/lib/python3.9/site-packages/transformers/models/bert/tokenization_bert_fast.py", line 303, in save_vocabulary files = self._tokenizer.model.save(save_directory, name=filename_prefix) Exception: No such file or directory (os error 2)

Jul 24 '22 15:07 kkavyashankar0009

Hey @kkavyashankar0009 ,

Have you tried creating the said directory ?

Jul 25 '22 08:07 Narsil

Yes

Jul 25 '22 08:07 kkavyashankar0009

Can you give a reproducing example ?

Jul 25 '22 09:07 Narsil

I'm choosing the best loss and trying to save the model............. if val_loss < best_loss: print("Best validation loss improved from {} to {}".format(best_loss, val_loss)) print() net_copy = copy.deepcopy(model) # save a copy of the model best_loss = val_loss best_ep = e + 1 print("Saving model")

model_to_save = net_copy os.makedirs(os.path.join(PATH_,"models"), exist_ok=True) # succeeds even if directory exists. output_model_file = os.path.join(PATH_,"models","my_own_model_file.bin") output_config_file = os.path.join(PATH_,"models","my_own_config_file.bin") output_vocab_file = os.path.join(PATH_,"models","my_own_vocab_file.bin") output_dir = os.path.join(PATH_,"models")

torch.save(model_to_save.state_dict(), output_model_file) model_to_save.config.to_json_file(output_config_file) tokenizer.save_vocabulary(output_dir) print('saved as pretrained')

Jul 25 '22 18:07 kkavyashankar0009

Hi @kkavyashankar0009 ,

Sorry but this contains just an extract of your code and I can't reproduce this it contains many missing bits and many things totally unrelated to the bug in question.

My first suggestion is to not use save_vocabulary but save_pretrained which is likely to work better.

Then in order to debug it would really help if you could extract the problematic code from your whole codebase so that you could have a shareable example showing the issue. In the current state it's impossible for us to help you.

Jul 26 '22 06:07 Narsil

Hi @kkavyashankar0009 ,

Sorry but this contains just an extract of your code and I can't reproduce this it contains many missing bits and many things totally unrelated to the bug in question.

My first suggestion is to not use save_vocabulary but save_pretrained which is likely to work better.

Then in order to debug it would really help if you could extract the problematic code from your whole codebase so that you could have a shareable example showing the issue. In the current state it's impossible for us to help you.

Thank you for the response. The issue is resolved.

Jul 27 '22 19:07 kkavyashankar0009

Do you mind sharing what was the issue ? It could help future readers.

Jul 28 '22 06:07 Narsil

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Feb 09 '24 01:02 github-actions[bot]

tokenizer.save_vocabulary()