BERT-WSD icon indicating copy to clipboard operation
BERT-WSD copied to clipboard

How to use models from huggingface?

Open JamesArthurHolland opened this issue 4 years ago • 3 comments

How do I use models that aren't in the specified list?

I would like to use this model:

https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased

How do I go about doing this?

Regards,

Jamie

JamesArthurHolland avatar Oct 12 '21 21:10 JamesArthurHolland

Hi Jamie,

One way to do it is to first download the weights, vocab and config file to a local folder then set the --model_name_or_path flag to the path of that local folder.

BPYap avatar Oct 19 '21 09:10 BPYap

I'm very unfamiliar with these formats. I downloaded the tensorflow package for the spanish uncased, it only has the following files:

model.ckpt-2000000.index model.ckpt-2000000.data-00000-of-00001 model.ckpt-2000000.meta

The pytorch version only has:

pytorch_model.bin

But the BERT-WSD library appears to look for a config file, which you also mentioned. Is this a tensorflow version specific thing?

JamesArthurHolland avatar Oct 19 '21 17:10 JamesArthurHolland

You will only need the pytorch_model.bin along with vocab.txt and config.json under the same directory. It seems that the links for the vocab and config files are broken in the Hugging Face model repository. Upon closer look I found the working links in the colaboratory notebook provided by the authors: https://colab.research.google.com/drive/1uRwg4UmPgYIqGYY4gW_Nsw9782GFJbPt.

You can obtain the two files from the following links: https://users.dcc.uchile.cl/~jperez/beto/cased_2M/vocab.txt https://users.dcc.uchile.cl/~jperez/beto/cased_2M/config.json

Hope it helps. Cheers.

BPYap avatar Oct 20 '21 05:10 BPYap