mouse-dictionary icon indicating copy to clipboard operation
mouse-dictionary copied to clipboard

Support for Wiktionary files

Open page200 opened this issue 3 years ago • 6 comments

Support for Wiktionary would be great. There are different languages, here for example are Portuguese files: https://dumps.wikimedia.org/ptwiktionary/20200901/

page200 avatar Sep 19 '20 20:09 page200

Thank you for your suggestion.

I checked the data but I feel, converting the data into TSV by user, and importing it to Mouse Dictionary, is fine. (Wiktionary -> TSV -> Mouse Dictionary)

That's because, as far as I checked the Wiktionary data, each entry could be very large. And, the format is not necessarily suitable for Mouse Dictionary view. For instance, it has many Wikimedia-specific markup, that Mouse Dictionary doesn't handle it.

For that reason, users convert the XML file into TSV file as the user like, and import it, is a good solution for it for the moment.

wtetsu avatar Sep 26 '20 17:09 wtetsu

Thanks for having a look!

Where can I find a script to convert Wiktionary to TSV, or an example of what the TSV format should look like?

The Wikimedia-specific markup probably can be ingored for now.

I'm looking forward to using Mouse Dictionary with my languages. :)

page200 avatar Sep 26 '20 18:09 page200

I don't know such a tool, but I may develop a tool for it in the future.

wtetsu avatar Oct 12 '20 11:10 wtetsu

I know PyGlossary https://github.com/ilius/pyglossary, which supports the Zim format for Kiwix https://www.kiwix.org/.

GrimPixel avatar Jan 12 '23 01:01 GrimPixel

+1 and the Wikipedia dump would be an amazing enhancement too.

I agree, the users might be able to do that by their own, ofc. But as a one who actually did that for Spanish, parsing a dictionary dump data, ugh that was a hell lot of work. I'm sure that many must be happy if there will be a plugin or such available in the Mouse Dictionary space. Like enabling users a quick setup that does a series of configurations in a friendly interface to import a dataset, like the Wiktionary dataset or Wikipedia dataset, as we are discussing here. Those plugins would open up the true potential to the world.

yuis-ice avatar May 18 '23 13:05 yuis-ice

It may be a good idea to leave to plug-ins what I cannot create as standard functionality for various reasons, but as far as I know, Chrome Extension (v3) does not allow execution of any code other than what is in the package 😕

https://developer.chrome.com/docs/extensions/mv3/intro/mv3-overview/#remotely-hosted-code

wtetsu avatar May 19 '23 03:05 wtetsu