xlnet icon indicating copy to clipboard operation
xlnet copied to clipboard

is there a vocabulary for xlnet

Open cotitan opened this issue 5 years ago • 3 comments

I wonder if there is a vocabulary for xlnet, so that giving a sentence, I could generate input_ids according to this vocab, instead of getting it from prepro_utils.encode_ids()

cotitan avatar Jul 07 '19 11:07 cotitan

prepro_utils.encode_ids() essentially just wraps SentencePiece, so this question is more appropriate for the SentencePiece repo: https://github.com/google/sentencepiece. The "vocab" is really in the spiece.model file.

Let me know if you have any more questions or if you have a specific use-case where you need a vocab file.

lukemelas avatar Jul 08 '19 20:07 lukemelas

I'm doing some research on text summarization where a vocabulary file is important. On decode stage, we also need to generate a sequence word by word, and each word is from a vocabulary.

cotitan avatar Jul 09 '19 08:07 cotitan

prepro_utils.encode_ids() essentially just wraps SentencePiece, so this question is more appropriate for the SentencePiece repo: https://github.com/google/sentencepiece. The "vocab" is really in the spiece.model file.

Let me know if you have any more questions or if you have a specific use-case where you need a vocab file.

prepro_utils.encode_ids() essentially just wraps SentencePiece, so this question is more appropriate for the SentencePiece repo: https://github.com/google/sentencepiece. The "vocab" is really in the spiece.model file.

Let me know if you have any more questions or if you have a specific use-case where you need a vocab file.

I need vocab file while running run_squad.py i need it.

guddu0007 avatar Oct 14 '19 10:10 guddu0007