Kaito Sugimoto

Results 30 issues of Kaito Sugimoto

When I trained my own tokenizer using `BertWordPieceTokenizer` in Japanese, I found that `[UNK]` tokens frequently appear after the tokenization. It took me some time to conclude that this was...

Stale

This is not a question, just my notes. In sentence correction, CGMH requires an additional `en` package to handle typos or tense errors, which seems not to be distributed by...

I know that SciBERT is pre-trained by the Semantic Scholar corpus. I also know that the Semantic Scholar corpus is not publicly available. I am wondering how many new papers...

## What you want to add I would suggest to use `String.prototype.matchAll()`, which provides multiple matching at the same time. The results of the second and subsequent matches will look...

enhancement

On the evaluation of JSQuAD, the prediction JSON file is output with unicodes escaped. This makes it difficult to check model outputs. It would be better to add `ensure_ascii=False` in...

This is not a bug report but a personal reply to the comment `"# TODO: why do we need cpu here?"` https://github.com/facebookresearch/BLINK/blob/5fe254dd64d37332347edc73738edcb56096183f/blink/biencoder/biencoder.py#L135-L144 This func is repeatedly used for each batch...