tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

tokenizer.save_vocabulary()

Open kkavyashankar0009 opened this issue 3 years ago • 7 comments

I'm trying to save and load a fine-tuned model that i trained following as per the site https://huggingface.co/transformers/v1.0.0/model_doc/overview.html#loading-google-ai-or-openai-pre-trained-weights-or-pytorch-dump . But facing this error below. Version: Transformer: 4.20.1

Traceback (most recent call last): File "/home/kshankar/Desktop/Project/Zero_Shot_updated/Fine-tuning/snli_bert_base_uncased3.py", line 352, in Acc,output_config_file, output_vocab_file, output_model_file = train(model=model, epochs=EPOCHS, train_data_loader=train_dataloader, val_data_loader=val_dataloader, File "/home/kshankar/Desktop/Project/Zero_Shot_updated/Fine-tuning/snli_bert_base_uncased3.py", line 313, in train tokenizer.save_vocabulary(output_vocab_file) File "/home/kshankar/miniconda3/lib/python3.9/site-packages/transformers/models/bert/tokenization_bert_fast.py", line 303, in save_vocabulary files = self._tokenizer.model.save(save_directory, name=filename_prefix) Exception: No such file or directory (os error 2)

kkavyashankar0009 avatar Jul 24 '22 15:07 kkavyashankar0009

Hey @kkavyashankar0009 ,

Have you tried creating the said directory ?

Narsil avatar Jul 25 '22 08:07 Narsil

Yes

kkavyashankar0009 avatar Jul 25 '22 08:07 kkavyashankar0009

Can you give a reproducing example ?

Narsil avatar Jul 25 '22 09:07 Narsil

I'm choosing the best loss and trying to save the model............. if val_loss < best_loss: print("Best validation loss improved from {} to {}".format(best_loss, val_loss)) print() net_copy = copy.deepcopy(model) # save a copy of the model best_loss = val_loss best_ep = e + 1 print("Saving model")

model_to_save = net_copy os.makedirs(os.path.join(PATH_,"models"), exist_ok=True) # succeeds even if directory exists. output_model_file = os.path.join(PATH_,"models","my_own_model_file.bin") output_config_file = os.path.join(PATH_,"models","my_own_config_file.bin") output_vocab_file = os.path.join(PATH_,"models","my_own_vocab_file.bin") output_dir = os.path.join(PATH_,"models")

torch.save(model_to_save.state_dict(), output_model_file) model_to_save.config.to_json_file(output_config_file) tokenizer.save_vocabulary(output_dir) print('saved as pretrained')

kkavyashankar0009 avatar Jul 25 '22 18:07 kkavyashankar0009

Hi @kkavyashankar0009 ,

Sorry but this contains just an extract of your code and I can't reproduce this it contains many missing bits and many things totally unrelated to the bug in question.

My first suggestion is to not use save_vocabulary but save_pretrained which is likely to work better.

Then in order to debug it would really help if you could extract the problematic code from your whole codebase so that you could have a shareable example showing the issue. In the current state it's impossible for us to help you.

Narsil avatar Jul 26 '22 06:07 Narsil

Hi @kkavyashankar0009 ,

Sorry but this contains just an extract of your code and I can't reproduce this it contains many missing bits and many things totally unrelated to the bug in question.

My first suggestion is to not use save_vocabulary but save_pretrained which is likely to work better.

Then in order to debug it would really help if you could extract the problematic code from your whole codebase so that you could have a shareable example showing the issue. In the current state it's impossible for us to help you.

Thank you for the response. The issue is resolved.

kkavyashankar0009 avatar Jul 27 '22 19:07 kkavyashankar0009

Do you mind sharing what was the issue ? It could help future readers.

Narsil avatar Jul 28 '22 06:07 Narsil

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Feb 09 '24 01:02 github-actions[bot]