ibus-typing-booster icon indicating copy to clipboard operation
ibus-typing-booster copied to clipboard

Test failure with Korean

Open thierry-FreeBSD opened this issue 6 years ago • 36 comments

On FreeBSD, the test suite fails with this message:

======================================================================
FAIL: test_korean (test_itb.ItbTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/ports/textproc/ibus-typing-booster/work/ibus-typing-booster-2.0.0/tests/test_itb.py", line 378, in test_korean
    self.assertEqual(self.engine.mock_preedit_text, '안녕하세이')
AssertionError: '안녕세이' != '안녕하세이'
- 안녕세이
+ 안녕하세이
?   +


----------------------------------------------------------------------
Ran 17 tests in 156.142s

FAILED (failures=1)
FAIL run_tests (exit status: 1)

Any idea?

thierry-FreeBSD avatar May 29 '18 17:05 thierry-FreeBSD

On Fedora, the Korean hunspell dictionary was recently updated and therefore I had to adapt the test. Probably, FreeBSD still has the old Korean dictionary.

mike-fabian avatar May 30 '18 04:05 mike-fabian

See:

https://github.com/mike-fabian/ibus-typing-booster/commit/ca0ecc31567fa0333dda71c9145ff45d0b1ac2c3

mike-fabian avatar May 30 '18 05:05 mike-fabian

That change was already in ibus-typing-booster 1.5.37 though. Did the test case still work for you in ibus-typing-booster 1.5.37?

mike-fabian avatar May 30 '18 05:05 mike-fabian

The comment at the start of the Korean test case says that there is no Korean hunspell dictionary on FreeBSD and therefore the test is skipped:

def test_korean(self): if not itb_util.get_hunspell_dictionary_wordlist('ko_KR')[0]: # No Korean dictionary file could be found, skip this # test. On some systems, like 'Arch' or 'FreeBSD', there # is no ko_KR.dic hunspell dictionary available, therefore # there is no way to run this test on these systems. # On systems where a Korean hunspell dictionary is available, # make sure it is installed to make this test case run. # In the ibus-typing-booster.spec file for Fedora, # I have a “BuildRequires: hunspell-ko” for that purpose # to make sure this test runs when building the rpm package. return

Did you recently get a Korean hunspell dictionary on FreeBSD? So itb_util.get_hunspell_dictionary_wordlist('ko_KR')[0] is successfull now in loading a Korean hunspell dictionary?

mike-fabian avatar May 30 '18 05:05 mike-fabian

And, the testcase ues the 'ko-romaja' input method. Do you have it available? On Fedora it is in this package:

$ rpm -qf /usr/share/m17n/ko-romaja.mim m17n-db-1.8.0-3.fc28.noarch

mike-fabian avatar May 30 '18 05:05 mike-fabian

Thanks for all these ideas!

  1. The Korean hunspell dictionary was old => I've just upgraded it, but still the same error
  2. You are right: v. 1.5.37 produces the same error
  3. I have a /usr/local/share/m17n/ko-romaja.mim file, installed by m17n-db-1.7.0, but no idea about it (I do not speak Korean) => I'll upgrade the port to the latest 1.8.0.

I shall keep investigating and let you know.

thierry-FreeBSD avatar May 30 '18 21:05 thierry-FreeBSD

If you add this test case to m17n_translit.py:

mfabian@taka:/local/mfabian/src/ibus-typing-booster/engine (release-2.0.1 *$)
$ git diff 
diff --git a/engine/m17n_translit.py b/engine/m17n_translit.py
index dea78a2..8dfab5c 100644
--- a/engine/m17n_translit.py
+++ b/engine/m17n_translit.py
@@ -257,6 +257,10 @@ class Transliterator:
     >>> trans.transliterate(['n', 'i', '3', 'h', 'a', 'o', '3'])
     '你好'
 
+    >>> trans = Transliterator('ko-romaja')
+    >>> trans.transliterate(list('annyeonghaseyo'))
+    '안녕하세요'
+
     If initializing the transliterator fails, for example
     because a non-existing input method was given as the argument,
     a ValueError is raised:
mfabian@taka:/local/mfabian/src/ibus-typing-booster/engine (release-2.0.1 *$)

Does this work? You can test like this:

mfabian@taka:/local/mfabian/src/ibus-typing-booster/engine (release-2.0.1 *$)
$ python3 m17n_translit.py 
mfabian@taka:/local/mfabian/src/ibus-typing-booster/engine (release-2.0.1 *$)

If you see no output from “python3 m17n_translit.py”, then it works.

mike-fabian avatar May 31 '18 07:05 mike-fabian

I have upgraded libofx, m17n-db and m17n-lib to be sure that it's not the source of the problem, and the examples in m17n_translit.py fail, not only the Korean one. It produces NULL pointer access, see the attached log. The problem seems located into m17n-lib, I'm going to check it. m17n_translit.log

thierry-FreeBSD avatar May 31 '18 20:05 thierry-FreeBSD

Did you find anything? Does ibus-m17n work for you?

mike-fabian avatar Jun 06 '18 09:06 mike-fabian

(Sorry for the delay...) ibus-m17n seems working, at least for the languages that I can understand. But I have noticed a strange thing: m17db installs 166 .mim files and 108 .lnm files, but ibus do not see most of them:

  • right click on the language switcher icon, and click preferences
  • go to the "Input Method" tab and click Add
  • the initial list only contains 7 languages
  • pressing the "..." button at the bottom shows some more categories of languages (33), but many are still missing! Among the missing ones, I cannot find anything for Korean. Do I need to install another package?

thierry-FreeBSD avatar Jun 10 '18 16:06 thierry-FreeBSD

Apparently you are using ibus-setup to look for the m17n input methods. That is OK but it is probably easier to see what is available from the command line:

ibus list-engine

lists all engines ibus offers (same list you see in ibus-setup). But you can easily grep in the output. For example:

$ ibus list-engine | grep m17n: | wc
    163     653    5917

shows that ibus offers 163 engines from ibus-m17n.

Korean is not among them:

$ ibus list-engine | grep m17n:ko
  m17n:kok:inscript2 - inscript2 (m17n)

The reason is that the m17n:ko:* engines are not considered useful as there is also ibus-hangul specialized for Korean:

$ /usr/libexec/ibus-engine-m17n --xml | grep rank | wc
ibus-m17n-Message: 09:40:33.779: skipped m17n:ja:anthy since its rank is lower than 0
ibus-m17n-Message: 09:40:33.785: skipped m17n:zh:py since its rank is lower than 0
ibus-m17n-Message: 09:40:33.785: skipped m17n:ru:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.786: skipped m17n:he:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.786: skipped m17n:ko:romaja since its rank is lower than 0
ibus-m17n-Message: 09:40:33.786: skipped m17n:ko:han2 since its rank is lower than 0
ibus-m17n-Message: 09:40:33.786: skipped m17n:sk:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.786: skipped m17n:sr:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.787: skipped m17n:kk:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.787: skipped m17n:hr:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.790: skipped m17n:cmc:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.790: skipped m17n:hy:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.790: skipped m17n:uk:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.790: skipped m17n:el:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.791: skipped m17n:lo:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.791: skipped m17n:my:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.791: skipped m17n:ug:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.791: skipped m17n:cs:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.791: skipped m17n:ka:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.792: skipped m17n:uz:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.792: skipped m17n:be:kbd since its rank is lower than 0
    163     163    4401

These engines which rank lower than 0 are omitted on purpose because something better seems to be available as a seperate engine (m17n:ja:anthy, m17n:zh:py, m17n:ko:romaja, ...) or they are just simulation of keyboard layouts. For example m17n:cs:kbd simulates a Czech keyboard layout on top of a US keyboard layout.

But m17n input methods which are not considered useful to offer via ibus-m17n might still be useful for ibus-typing-booster. ibus-typing-booster cannot use ibus-hangul but it can use /usr/share/m17n/ko-romaja.mim. And the keyboard layout simulations can be useful with ibus-typing-booster as well, for example if you want to type Czech and English at the same time with ibus-typing-booster, it makes sense to set a US English keyboard layout and add cs-kbd in ibus-typing-booster so you can type both languages and get completions without having to change keyboard layouts.

mike-fabian avatar Jun 11 '18 07:06 mike-fabian

You write: “ibus-m17n seems working, at least for the languages that I can understand.”

That makes it a bit mysterious why the examples in m17n_translit.py fail.

Could you find out more why these examples are failing?

mike-fabian avatar Jun 11 '18 07:06 mike-fabian

I checked on freebsd now and not all tests in m17n-translit.py fail for me.

Actually only those using inscript2 and the Korean one fail.

That the inscript2 tests fail is because you don’t have the inscript2 input methods for m17n installed. They are included in the m17n-db package in Fedora, but apparently not on most distributions. They are available here:

https://releases.pagure.org/inscript2/inscript2-20160423.tar.gz
$ tar xvf inscript2-20160423.tar.gz
x inscript2/
x inscript2/icons/
...

And then:

root@freebee:/usr/home/mfabian/inscript2 # cp  icons/* /usr/local/share/m17n/icons/
root@freebee:/usr/home/mfabian/inscript2 # cp IM/* /usr/local/share/m17n/

After doing that, only the Korean test still fails:

$ pwd
/usr/home/mfabian/ibus-typing-booster/engine
$ python3 m17n_translit.py
**********************************************************************
File "m17n_translit.py", line 261, in __main__.Transliterator
Failed example:
    trans.transliterate(list('annyeonghaseyo'))
Expected:
    '\uc548\ub155\ud558\uc138\uc694'
Got:
    '\uc548\ub155\uc138\u315b'
**********************************************************************
1 items had failures:
   1 of  31 in __main__.Transliterator
***Test Failed*** 1 failures.
$ 

mike-fabian avatar Jun 11 '18 12:06 mike-fabian

That failure of the Korean test on FreeBSD is quite weird: Expected: '\uc548\ub155\ud558\uc138\uc694' Got: '\uc548\ub155\uc138\u315b'

Translating the hex codes into the real characters, this is:

Expected: 안녕하세요 Got: 안녕세ㅛ

mike-fabian avatar Jun 11 '18 13:06 mike-fabian

ko-romaja-available-in-gnome-on-freebsd

As this screenshot shows, I do see the ko-romaja m17n input method in the gnome-control-center in FreeBSD.

mike-fabian avatar Jun 11 '18 14:06 mike-fabian

ko-romaja-from-ibus-m17n-does-not-work-correctly-on-freebsd

I added "Korean (romaja (m17n))" in the gnome-control-center, selected this input method and typed "annyeonghaseyo" into gedit.

The result is the same as in the m17n_translit test case, i.e. one gets

안녕세ㅛ

instead of

안녕하세요

This seems clearly wrong, but as it is the same error when using ibus-m17n and when trying to execute the m17n_translit.py test cases, I think it has nothing to do with ibus-typing-booster. As the same error occurs when using ibus-m17n, this looks like an error in m17n-lib and/or m17n-db.

mike-fabian avatar Jun 11 '18 14:06 mike-fabian

ibus-typing-booster-2 0 1-works-on-freebsd

I ignored the error in the Korean test case for the moment and tried instead whether I can successfully install ibus-typing-booster 2.0.1 on FreeBSD and make it work.

I had to do two small fixes to make it work:

commit cde1d57ad70158b1e01f1a681f5d863a39bc7379 Author: Mike FABIAN [email protected] Date: Mon Jun 11 17:39:58 2018 +0200

Fix some bugs in the usage of “prefix”

To make ./configure --prefix=... actually work for prefixes other than "usr".

I found that it didn’t work for --prefix=/usr/local which is used on FreeBSD.

commit eeeb2a7ffeab32a48262d373f76976ab156133f7 Author: Mike FABIAN [email protected] Date: Mon Jun 11 15:32:38 2018 +0200

Make itb_util.get_ime_help() work on FreeBSD

The .mim files are in '/usr/local/share/m17n' on FreeBSD, they
are in '/usr/share/m17n' on Fedora and openSUSE.

This will be in the 2.0.1 release.

mike-fabian avatar Jun 11 '18 16:06 mike-fabian

Many points!

  1. Your message about ibus list-engine On my desktop, ibus list-engine only lists lines beginning with "xkb:", then ibus list-engine | grep m17n is empty.

Do I miss some configuration step?

  1. inscript2 I did not know about this one! Your link is related to a Red Hat package, and the links in their README are dead; do you know if there is a homepage? I'm going to make an official port for FreeBSD, that will solve the problem with Hindi.

  2. About the Korean test Did you install m17n from the ports or the packages? ATM they install version 1.7.0. I have submitted a patch to upgrade them to 1.8.0 (available at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=228648 ) but I'm waiting for maintainer's approval before committing them. And reading the changelog, it seems that 1.8.0 should fix some points (anyway, 1.8.0 is installed on my desktop, and the ITB tests still fail).

  3. About gnome-control-center ATM I have no Gnome installed, only KDE, and I am doing my tests with Ibus preferences. KDE Keyboard Settings offers 3 different ways for Hangul, but I do not know if they are related to m17n: capture_keyboard

  4. About the prefix / localbase issues They were already handled by the port, but thanks for that: it will simplify it!

Many thanks for your time and all these ideas. I was too busy to check it today, but I shall work on it again ASAP!

thierry-FreeBSD avatar Jun 11 '18 20:06 thierry-FreeBSD

You write: “On my desktop, ibus list-engine only lists lines beginning with "xkb:", then ibus list-engine | grep m17n is empty.”

I just did “pkg install m17n-lib”, “pkg install m17n-db” and “pkg install ibus-m17n”.

Then restarted the gnome session to restart ibus.

(“ibus restart” should also work and then there is no need to restart the Gnome session. But “ibus restart” seemed not to work correctly on the FreeBSD version where I tried it, the input into the gnome-terminal stopped working after “ibus restart” so I restarted the whole Gnome session instead).

mike-fabian avatar Jun 11 '18 20:06 mike-fabian

You write: “inscript2 I did not know about this one! Your link is related to a Red Hat package, and the links in their README are dead; do you know if there is a homepage? I'm going to make an official port for FreeBSD, that will solve the problem with Hindi.”

I don’t know of any other home page than:

https://releases.pagure.org/inscript2/

That work was done by Red Hat. I pinged the guy who did that today and asked him to get it upstreamed so it will be included in the next m17n-db release. Apparently he only did not yet upstream it because there were still problems with the inscript2 standard, apparently the Indian government was a bit slow in releasing the inscript2 standard. But now it seems mature enough and he said there is no reason against upstreaming it and he will do it soon.

In my ibus-typing-booster package for openSUSE, I just include that inscript2-20160423.tar.gz tar ball, see: https://build.opensuse.org/package/show/M17N/ibus-typing-booster As soon as this is included in a future release of m17n-db, I’ll remove that.

Of course ibus-typing-booster still works without that, but you can use only inscript and not inscript2 then for Indian languages.

mike-fabian avatar Jun 11 '18 20:06 mike-fabian

You wrote: “Did you install m17n from the ports or the packages? ATM they install version 1.7.0.”

I used “pkg install m17n-lib”, “pkg install m17n-db”, and “pkg install ibus-m17n”. This gave me the following versions:

ibus-m17n-1.3.4.16 m17n-db-1.7.0 m17n-lib-1.7.0_2

mike-fabian avatar Jun 11 '18 20:06 mike-fabian

What you write about the Korean settings in KDE looks like keyboard settings, it seems related to whether you use a "real" Korean keyboard which is almost like the US English keyboard but with some extra keys "Hanja/Hangul". If one uses a regular US English keyboard to write Korean, that should work fine but you probably have to choose to use one of the alternatives to the real "Hanja/Hangul" keys.

This has nothing to do with the "ko-romaja" input method from m17n.

mike-fabian avatar Jun 11 '18 20:06 mike-fabian

Oops... I just noticed that after upgrading m17n-lib and m17n-db, I forgot to reinstall ibus-m17n! Now the problem with ibus list-engine is solved.

thierry-FreeBSD avatar Jun 11 '18 20:06 thierry-FreeBSD

Could you find anything which causes the problem? The Korean test case works fine for me on Fedora 28, so I guess this is a problem specific to FreeBSD.

mike-fabian avatar Jun 27 '18 07:06 mike-fabian

Yes, it is surely specific to FreeBSD. I have tried many things, without success for the moment. Besides this test, everything seems OK, and ibus-typing-booster is working fine. (sorry for the delay)

thierry-FreeBSD avatar Jul 01 '18 16:07 thierry-FreeBSD

The Korean test is the only one which is failing when doing

python3 m17n_translit.py

?

mike-fabian avatar Jul 01 '18 20:07 mike-fabian

Its output is:

**********************************************************************
File "m17n_translit.py", line 261, in __main__.Transliterator
Failed example:
    trans.transliterate(list('annyeonghaseyo'))
Expected:
    '\uc548\ub155\ud558\uc138\uc694'
Got:
    '\uc548\ub155\uc138\u315b'
**********************************************************************
1 items had failures:
   1 of  31 in __main__.Transliterator
***Test Failed*** 1 failures.

thierry-FreeBSD avatar Jul 01 '18 20:07 thierry-FreeBSD

Did you find anything new here?

mike-fabian avatar Dec 09 '18 06:12 mike-fabian

I've just upgraded ITB to 2.3.1, and launched the tests again. Meanwhile, many dependencies (ibus, etc.) have been upgraded. But unfortunately, the Korean test still ends with the message:

======================================================================
FAIL: test_korean (test_itb.ItbTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/ports/textproc/ibus-typing-booster/work/ibus-typing-booster-2.3.1/tests/test_itb.py", line 407, in test_korean
    self.assertEqual(self.engine.mock_preedit_text, '안녕하세이')
AssertionError: '안녕세이' != '안녕하세이'
- 안녕세이
+ 안녕하세이
?   +


----------------------------------------------------------------------
Ran 20 tests in 183.269s

FAILED (failures=1)
FAIL run_tests (exit status: 1)

thierry-FreeBSD avatar Dec 09 '18 11:12 thierry-FreeBSD

I just upgraded to 2.6.5, and the above error with Korean is still there, but there is also one more failure:

======================================================================
FAIL: test_accent_insensitive_matching_french_dictionary (test_itb.ItbTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/ports/textproc/ibus-typing-booster/work/ibus-typing-booster-2.6.5/tests/test_itb.py", line 577, in test_accent_insensitive_matching_french_dictionary
    'différemment')
AssertionError: 'différemment po:adv' != 'différemment'
- différemment po:adv
?             -------
+ différemment

thierry-FreeBSD avatar Aug 29 '19 18:08 thierry-FreeBSD

Thierry Thomas [email protected] さんは書きました:

I just upgraded to 2.6.5, and the above error with Korean is still there, but there is also one more failure:

======================================================================
FAIL: test_accent_insensitive_matching_french_dictionary (test_itb.ItbTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/ports/textproc/ibus-typing-booster/work/ibus-typing-booster-2.6.5/tests/test_itb.py", line 577, in test_accent_insensitive_matching_french_dictionary
    'différemment')
AssertionError: 'différemment po:adv' != 'différemment'
- différemment po:adv
?             -------
+ différemment

grep po:adv /usr/share/myspell/fr_FR.dic

finds nothing in the French hunspell dictionary on my Fedora 30 system.

Maybe you have a different version of the French dictionary? Does your dictionary contain a line "différemment po:adv"?

-- 📧 Mike FABIAN [email protected] 睡眠不足はいい仕事の敵だ。

mike-fabian avatar Aug 30 '19 08:08 mike-fabian

OK, this is the reason: I don't have myspell, but hunspell, and

$ grep différemment /usr/local/share/hunspell/fr_FR.dic
différemment po:adv
indifféremment/D'Q' po:adv

thierry-FreeBSD avatar Sep 02 '19 19:09 thierry-FreeBSD

That looks like a bug the fr_FR.dic to me. Because each line in such a dictionary should contain a word optionally followed by / and some flags used to generate additional inflected forms of that word. I am not sure what po:adv means, "adv" might mean "adverb". Anyway, “différemment po:adv” doesn’t seem to be a word, the “po:adv” part should not be part of the word. So it looks like the / which should separate the word from the extra information is missing on that line.

mike-fabian avatar Sep 02 '19 20:09 mike-fabian

This is a part of hunspell's specification: see p. 9 and 10 of https://grammalecte.net/_misc/hunspell4.pdf

thierry-FreeBSD avatar Sep 03 '19 19:09 thierry-FreeBSD

You are right, I’ll make the following change in the 2.6.6 release to make it work correctly with the newer French dictionaries:

diff --git a/engine/itb_util.py b/engine/itb_util.py
index 500471a..14d3832 100755
--- a/engine/itb_util.py
+++ b/engine/itb_util.py
@@ -2986,7 +2986,6 @@ def find_hunspell_dictionary(language):
     '''
     Find the hunspell dictionary file for a language
 
-
     :param language: The language of the dictionary to search for
     :type language: String
     :rtype: tuple of the form (dic_path, aff_path) where
@@ -3136,14 +3135,30 @@ def get_hunspell_dictionary_wordlist(language):
     # différemment     8
     # différence/1     2
     #
-    # Therefore, remove everthing following a '/' or a tab from a line
-    # to make the memory use of the word list a bit smaller and the
-    # regular expressions we use later to match words in the
+    # Newer French dictionaries downloaded from
+    #
+    # http://grammalecte.net/download/fr/hunspell-french-dictionaries-v6.4.1.zip
+    #
+    # even contain stuff like:
+    #
+    # différemment po:adv
+    # différence/S.() po:nom is:fem
+    #
+    # i.e. the separator between the word and the extra stuff
+    # can be a space instead of a tab.
+    #
+    # As far as I know, hunspell dictionaries never contain whitespace
+    # within the words themselves.
+    #
+    # Therefore, remove everything following a '/', ' ', or a tab from
+    # a line to make the memory use of the word list a bit smaller and
+    # the regular expressions we use later to match words in the
     # dictionary slightly simpler and maybe a tiny bit faster:
+    #
     word_list = [
         unicodedata.normalize(
             NORMALIZATION_FORM_INTERNAL,
-            re.sub(r'[/\t].*', '', x.replace('\n', '')))
+            re.sub(r'[/\t ].*', '', x.replace('\n', '')))
         for x in dic_buffer
     ]
     return (dic_path, dictionary_encoding, word_list)

mike-fabian avatar Sep 10 '19 11:09 mike-fabian

The fix for the new French dictionaries is included here:

https://github.com/mike-fabian/ibus-typing-booster/releases/tag/2.6.6

mike-fabian avatar Sep 11 '19 14:09 mike-fabian