mwoffliner icon indicating copy to clipboard operation
mwoffliner copied to clipboard

Further simplify fr.wiktionary content by removing duplicates

Open Popolechien opened this issue 5 years ago • 1 comments

With the objective of creating an app that has a somewhat manageable size, there are a lot of duplicates within the current http://library.kiwix.org/wiktionary_fr_app_nopic zim that can be removed, e.g. all variations around past tense or plural that bring little to no value. Luckily all such variations have pretty much the same structure, with the article starting with Form de _X, Y or Z_(the underlying template/wikicode being {{S|type|language code|flexion}} as a level 3 section title).

Capture d’écran 2020-07-21 à 14 36 48

Wiktionarians seem to have made it a rule to not give a definition of such words, so I guess they won't exactly be missed if we remove them.

Alternatively, all root words (of interest) start with the Étymologie Level 3 section title, which should make it easier to parse.

Capture d’écran 2020-07-21 à 14 39 32

Popolechien avatar Jul 21 '20 13:07 Popolechien