[website] Inconsistent pagination
I have a simple function to get links from Kaikki when given a word, an edition and a target language that I just realized it does not work because the pagination is sometimes (most of the time), language-dependent.
Consider:
- https://kaikki.org/dictionary/German/meaning/R/Ro/Rock.html
- https://kaikki.org/dictionary/English/meaning/R/Ro/Rock.html
- https://kaikki.org/elwiktionary/Greek/meaning/τ/τρ/τρέχω.html
- https://kaikki.org/dictionary/Greek/meaning/τ/τρ/τρέχω.html
but, and this is what happens for any other language that I tried other than those two:
- https://kaikki.org/dewiktionary/Deutsch/meaning/R/Ro/Rock.html
- https://kaikki.org/dewiktionary/Neugriechisch/meaning/τ/τρ/τρίτος.html
instead of German/Greek.
It makes sense for English, but why does Greek have the language names in English?
I would rather see English used everywhere instead of changing Greek for Ελληνικά in the Greek edition, but I am not entirely unbiased. Let me know what you think. It should be possible, otherwise I don't know how the Greek edition does it...
I am aware that I can use the "All%20languages%20combined" at that position, but it adds noise when it comes to debugging.
It would also be nice to change dictionary to enwiktionary but iirc that was rejected in some other issue.
lang_name at here https://github.com/tatuylonen/wiktextract/blob/01fc53eff7d40fa7187e656439d58bed1692d32e/src/wiktextract/extractor/el/page.py#L117
should use the language section title text or change this line: https://github.com/tatuylonen/wiktextract/blob/01fc53eff7d40fa7187e656439d58bed1692d32e/src/wiktextract/extractor/el/page.py#L311
to code_to_name(lang_code, "el")
Or the other editions should have used the English language names instead of native language names. Either way, I don't think it's too late to change the Greek one (instead of changing a bunch of other extractors).
Ok, turns out changing the Greek edition to use Greek names is more annoying than I thought, I'll do it later when I have time.
EDIT: Or for consistency with the original edition we could change the data (which is presented in English) into English.
This field is defined to have localized name... https://github.com/tatuylonen/wiktextract/blob/01fc53eff7d40fa7187e656439d58bed1692d32e/src/wiktextract/extractor/el/models.py#L234
I guess I saved the original localized name to lang field probably because the language code is converted from language name or template argument, but some language names may not be able to be converted to a code, and add the original text is slightly better than an "unknown" value.
Yeah, I copy pasted that from your extractors so the description was left. Of course didn't notice that at the time or register what it would mean.