Daniel Mietchen
Daniel Mietchen
E.g. number of senses/ alternative forms/ statements/ references/ nouns/ verbs etc.
showing both the existing and the non-existing lexemes is a good default, but it would be useful to change it such that only one of these categories is shown. Showing...
Japanese example attached — the sentence ```下記方法で体内への侵入を防止すること``` from [here](https://ja.wikipedia.org/w/index.php?title=2019%E6%96%B0%E5%9E%8B%E3%82%B3%E3%83%AD%E3%83%8A%E3%82%A6%E3%82%A4%E3%83%AB%E3%82%B9%E3%81%AB%E3%82%88%E3%82%8B%E6%80%A5%E6%80%A7%E5%91%BC%E5%90%B8%E5%99%A8%E7%96%BE%E6%82%A3&oldid=76973353#%E5%80%8B%E4%BA%BA%E3%81%A7%E3%81%A7%E3%81%8D%E3%82%8B%E4%BA%88%E9%98%B2%E5%AF%BE%E7%AD%96) should be tokenized somewhat like the following, with a single pipe character standing for a word boundary, two for lexeme boundaries...
This would allow to better capture more complex constructs like [matrix-assisted laser desorption/ionization time-of-flight mass spectrometry](https://www.wikidata.org/w/index.php?sort=relevance&search=matrix-assisted+laser+desorption%2Fionization+time-of-flight+mass+spectrometry&title=Special%3ASearch&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1&ns120=1) ([Q1792222](https://www.wikidata.org/wiki/Q1792222)). Ideally, the user could set lower and upper bounds for N.
It would be great if text-to-lexemes could just be given a URL and then use the HTML as input. To avoid issues with copyright/ datamining, it probably makes sense to...
currently, both are shown, but depending on the use case and input, it would be useful to be able to filter out one of these categories.
e.g. https://tools.wmflabs.org/ordia/text-to-lexemes?text=analysed is not currently recognized.
e.g. as per https://twitter.com/EvoMRI/status/1136312031693328385
I am looking into using the Hub for identifier mapping at the scale of thousands of items, so would like to do some pagination in batches of hundreds or so....
On https://wdumps.toolforge.org/status , the usage of the words "oldest" and "newest" is inverse to their meaning: 