Sotaro Takeshita / 竹下 颯太郎
Sotaro Takeshita / 竹下 颯太郎
**Describe the bug** the google drive link to download wmt14 dataset is now unavailable. **To Reproduce** ```py import lineflow.datasets as lfds train_dataset = lfds.Wmt14("train") ``` **Expected behavior** A clear and...
**Is your feature request related to a problem? Please describe.** Mean of token embeddings from BERT is known to perform better than embedding from BERT. But currently, BERT encoder returns...
**Is your feature request related to a problem? Please describe.** Allow custom models for word embeddings, not only predefined models online. **Describe the solution you'd like** **Describe alternatives you've considered**...
**Describe the bug** Mac OS Catalina does not work well with 'urllib'. So use 'requests' instead. ref) https://stackoverflow.com/questions/57630314/ssl-certificate-verify-failed-error-with-python3-on-macos-10-15
**Is your feature request related to a problem? Please describe.** Support fine-tuning word vectors with downstream tasks. **Describe the solution you'd like** Load word embeddings on `torch.nn.Module`. **Additional context** ref:...
**Is your feature request related to a problem? Please describe.** When I extract word embedding many times in one process, it's better in its performance that `get_word_vector` can cache. **Describe...
**Is your feature request related to a problem? Please describe.** For a light weight sentence embedding, implement [this](https://arxiv.org/pdf/1906.08340.pdf) model. **Describe the solution you'd like** Implement "Hard threshold" model described in...