ordia
ordia copied to clipboard
Wikidata lexemes presentations
Japanese example attached — the sentence ```下記方法で体内への侵入を防止すること``` from [here](https://ja.wikipedia.org/w/index.php?title=2019%E6%96%B0%E5%9E%8B%E3%82%B3%E3%83%AD%E3%83%8A%E3%82%A6%E3%82%A4%E3%83%AB%E3%82%B9%E3%81%AB%E3%82%88%E3%82%8B%E6%80%A5%E6%80%A7%E5%91%BC%E5%90%B8%E5%99%A8%E7%96%BE%E6%82%A3&oldid=76973353#%E5%80%8B%E4%BA%BA%E3%81%A7%E3%81%A7%E3%81%8D%E3%82%8B%E4%BA%88%E9%98%B2%E5%AF%BE%E7%AD%96) should be tokenized somewhat like the following, with a single pipe character standing for a word boundary, two for lexeme boundaries...
Ordia currently does not support Chinese at all. Proper support will need #95, of course...
This makes it harder to enter affixes. E.g. from https://sv.wikipedia.org/wiki/Lista_%C3%B6ver_prefix_i_svenskan
https://www.wikidata.org/wiki/User:Mahir256/syndepgraph.js
Add possible link to Bodh. It is a kind of lexeme Tabernacle available from: https://bodh.toolforge.org/ and documented at https://www.wikidata.org/wiki/Wikidata:Bodh
This would allow to better capture more complex constructs like [matrix-assisted laser desorption/ionization time-of-flight mass spectrometry](https://www.wikidata.org/w/index.php?sort=relevance&search=matrix-assisted+laser+desorption%2Fionization+time-of-flight+mass+spectrometry&title=Special%3ASearch&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1&ns120=1) ([Q1792222](https://www.wikidata.org/wiki/Q1792222)). Ideally, the user could set lower and upper bounds for N.
For a larger text, it would be useful to know how many times each word occurs, so work could focus on the more common words. If the extracted list is...
Add link to search in Smurf - the Danish newspaper search facility: http://labs.statsbiblioteket.dk/smurf/
This could evolve into a way better UI than the Vue UI
Provide option to translate the tool with separate url for each lang codes.