spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Sentencepiece base Language

Open tamuhey opened this issue 4 years ago • 1 comments

feature request:

Sentencepiece is the tokenizer used in XLNet.
I think if Language tokenize text with sentencepiece, the alignment process can be skipped and it make model efficient.

tamuhey avatar Oct 21 '19 10:10 tamuhey

I added this functionality in camphr.

Document : https://camphr.readthedocs.io/en/latest/notes/sentencepiece.html

tamuhey avatar Apr 09 '20 06:04 tamuhey