Sutou Kouhei
Sutou Kouhei
https://www.who.int/tb/country/data/download/en/
https://www.tensorflow.org/datasets/catalog/c4
http://cmp.felk.cvut.cz/~tylecr1/facade/ License: CC BY-SA (What version?)
See also: https://uribo.hatenablog.com/entry/2019/12/22/102452
https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs
Corpus of Annual Reports in Japan https://github.com/chakki-works/CoARiJ
https://github.com/UniversalDependencies/UD_Japanese-PUD
https://github.com/UniversalDependencies/UD_Japanese-GSD
https://www.geonames.org/export/ Example to use from PostgreSQL: https://www.2ndquadrant.com/en/blog/postgresql-12-implementing-k-nearest-neighbor-space-partitioned-generalized-search-tree-indexes/
Kyoto University Web Document Leads Corpus https://github.com/ku-nlp/KWDLC