tuja-vortaro
tuja-vortaro copied to clipboard
Add Wiktionary as a source
Wiktionary can be a powerful tool here. How are you making the sources machine-readable?
Manual DTD and XML parsing. For example, https://github.com/sstangl/tuja-vortaro/blob/master/revo/convert-to-js.py.
Wiktionary would be a good source in theory. It is much easier to edit than ReVo, and its data licensing would permit this program to be AGPLv3+, which I would like.
On the other hand, the data quality is much lower than that of either ReVo or ESPDIC. The entries that exist are extremely ill-specified, and basic words like taŭga are not found at all.
So I don't think switching the data source is a good idea until there is a team actively working on improving dictionary quality. Currently that momentum exists with ReVo, even though it is slight. I would very much like to see a libre version of PIV.
Isn't it possible to add more than one source and eliminate duplicates?
That would be more work but would be possible. I'm not opposed. That would certainly have the benefit that editing wiktionary would be the quickest way to improve the quality of this dictionary, especially with translations.
In my view, the more sources, the better (more reliability). I'm also going to try to check with Yves Nevelsteen regarding the licensing of Komputeko. That would be a major, excellent addition.
I just looked into adding Wiktionary as a source. The quality is so low that it is difficult to tell what even is an Esperanto word -- there are entries for words in various languages. The pages themselves also do not have a consistent structure. It would be very difficult to make something useful out of this.