Pengzhi Gao
Pengzhi Gao
Hi @ZhitingHu , could you help answer this question?
Sorry for the late reply. We will take a look at this issue.
Sorry for the late reply. We will take a look at it.
This task includes the `ELMoEmbedder` adapted from `allennlp` and the corresponding `ELMoTokenizer` adapted from `allennlp` and [MosesTokenizer](https://github.com/alvations/sacremoses).
For the tokenizer part, see: https://github.com/allenai/allennlp/issues/1933
See Zecong's comments in #208
Yes. I think we should support this feature. Since `pre-trained` tokenizers already take care of the corresponding vocabulary files and the special tokens, it is unnecessary to require vocabulary file...
We deleted `sentencepiece` vocab file because `sentencepiece` mode file is purely self-contained, and vocab file is never used in the tokenizer. To the best of my knowledge, the vocab file...
Could you write down how you integrate `tokenizer` with `pairedtextdata`? There is another related issue #256 I think we should provide the interface to use `tokenizer` instead of `vocab`. Do...
Please merge `master` into your branch to pass the `codecov` test.