Dictionary icon indicating copy to clipboard operation
Dictionary copied to clipboard

Support DSL dictionaries (ABBYY Lingvo raw format)

Open ghost opened this issue 8 years ago • 3 comments

DSL specifications & docs

  • http://lingvo.helpmax.net/en/troubleshooting/dsl-compiler

Example

  • https://github.com/Tvangeste/SampleDSL
    • https://github.com/Tvangeste/SampleDSL/issues/1

Many Wiktionary dictionaries already converted to DSL format

  • https://github.com/open-dsl-dict

ghost avatar Nov 29 '17 03:11 ghost

As can be seen from the description page, this is a format that is completely unusable for realtime use and needs to be compiled for use. Thus adding support for it to the dictionary app itself does not make sense. At least not for anything above maybe a few 100 words. Support could be added to the conversion tool (which is a separate project/repo, DictionaryPC) but that is only useful for anyone able to use it, so I am not sure there is a point at all.

rdoeffinger avatar Nov 30 '17 20:11 rdoeffinger

Thus adding support for it to the dictionary app itself does not make sense. At least not for anything above maybe a few 100 words.

I create small DSL dictionaries (each dictionary.dsl has ~ 10 000 words with less 30 words per each card) for use it in GoldenDict -- all them work well without lags.

Could (theoretically) such dictionaries work well in Aard 2?

this is a format that is completely unusable for realtime use and needs to be compiled for use.

What about support DSL compressed with dictzip/idzip - dictionary.dsl.dz?

  • https://github.com/dictzip/dictzip-java
  • https://github.com/bauman/python-idzip

Examples of .dsl and .dsl.dz dictionaries could be found in "Open DSL Dict" repos

  • https://github.com/open-dsl-dict

ghost avatar Dec 01 '17 02:12 ghost

To be honest the biggest problem will probably be that I won't have time to implement it. However if you could point to one or 2 specific dictionaries that would be representative to your use-case that might help. Also the size in MB would probably be a more relevant measure than the number of words. My reference here is something like the English or French dictionaries in QuickDic that are several 100 MB large after compression which just cannot work in a format like dsl. And dictzip purely helps with storage size, all other issues it would actually make worse. A few specific concerns I came across for this format:

  • The dictionary format is designed for a one-way translation (language A to language B) whereas QuickDic is designed for dictionaries that work in both directions. That will cause some UI issues.
  • The "cards" are not required to be sorted, so the only way to find a word involves searching the whole file, which will have horrible performance
  • This can be avoided by reading the whole file in memory at startup, but will result in slow startup. In addition, Android apps by default (at least on some devices) are limited to around 20 MB of memory usage. That means this won't work reliably for dictionaries larger than maybe 15 MB.
  • The syntax is both very complex, with a huge amount of possible tags (not corresponding to any standard notation it seems), and loosely defined (e.g. both tabs and spaces are allowed and should be treated the same, multiple spaces are allowed but should be treated as one etc). This will reduce the parsing speed quite a bit, and increase effort to implement. The specification is also badly written: that a card can have multiple headwords is not mentioned in the definition of the card format but at a random place later in it, increasing effort and risk to implement.
  • There are includes in the syntax, which are not compatible with some ways of accessing data in Android (e.g. downloads, "share" functionality)
  • The dictionaries are not necessarily in unicode, but instead the use of code pages is allowed
  • Many of the tags and functionality specified (thing like "multimedia section") is clearly aimed at one specific application and will not integrate well in an app with a different UI

Overall, best realistic case is partial support, with many features not supported and occasional issues with some dictionaries, with slow startup times and limited to dictionaries smaller than maybe 10 MB. If that is useful, the next step would be to look how painful it would be to support more than one format in QuickDic, as some of the UI code is rather tightly coupled to the native dictionary format.

rdoeffinger avatar Dec 14 '17 18:12 rdoeffinger