HeliBoard icon indicating copy to clipboard operation
HeliBoard copied to clipboard

Less priority for abbreviations (eg. country names)

Open woj-tek opened this issue 1 year ago • 8 comments

(there are a couple of issues about capitalization but I think this one is somewhat different)

Is your feature request related to a problem? Please describe.

Quite often (with very aggressive auto-correct and multi language) HeliBoard insist on inserting capitalized country names so: "This is no good" becomes "This is NO good".

Describe the solution you'd like It would be better if those country abbreviations had lower priory when doing auto-suggest :)

woj-tek avatar Aug 09 '24 07:08 woj-tek

Unable to reproduce that specific example you've mentioned (what's "multi language"?), but in theory, I think the capitalised country names would be coming from the dictionary, so you could take the base word list (eg https://codeberg.org/Helium314/aosp-dictionaries/src/branch/main/wordlists/main_en_US.combined.gz), lower the priority of any word with a capital letter in a script, and recompile the dictionary, no?

Symbiomatrix avatar Aug 12 '24 09:08 Symbiomatrix

(what's "multi language"?)

I have enabled Polish with alternative English and Spanish autocorrect

Unable to reproduce that specific example you've mentioned (…), but in theory, I think the capitalised country names would be coming from the dictionary,

Most likely, though it's somewhat weirder. And it happens somewhat randomly. Recently quite often when starting message with "oh, what a shame" the first word is alwasy capitalised to "OH" (I guess ISO code for Ohio: https://en.wikipedia.org/wiki/Ohio).

Interestingly I have Polish, English (UK!) and Spanish (generic)

(eg https://codeberg.org/Helium314/aosp-dictionaries/src/branch/main/wordlists/main_en_US.combined.gz), lower the priority of any word with a capital letter in a script, and recompile the dictionary, no?

Hmm... the point of reporting issue is improving the application itself :)

I checked the dictionary (It's for US) and then got the UK/GB one and... there is no "OH" there. Then I got Spanish one and still nothing and finally I found the line in Polish one:

 word=OH,f=101,flags=abbreviation,originalFreq=101

And abbreviation caught my attention. I think I should re-phrase the title to "less priority for abbreviations" :)

PS. How are the dictionaries created? I see that polish one is sourced from https://github.com/openboard-team/openboard/blob/v1.4.5/dictionaries/pl_wordlist.combined.gz. The format in HeliBoard is following:

dictionary=main:pl,locale=pl,description=Polski,date=1414726264,version=54
 word=w,f=218,flags=,originalFreq=218
 word=i,f=209,flags=,originalFreq=209
 word=z,f=206,flags=,originalFreq=206
 word=na,f=205,flags=,originalFreq=205

but I guess originally we have a file with the plain list of words in increasing frequency order which is then compiled to HeliBoard format as described in readme (https://codeberg.org/Helium314/aosp-dictionaries#readme)? But where does come from information whether it's abbreviation or not?

woj-tek avatar Aug 13 '24 08:08 woj-tek

The whole suggestion score (and thus order) thing is a bit of a mystery to me. It's all happening in the c++ part, which I neven even looked at (and I don't plan on having a look). There are some small adjustments though to slightly prefer word combinations that have been typed previously.

But where does come from information whether it's abbreviation or not?

The list should be the same as from AOSP keyboard, also used by LineageOS: https://github.com/LineageOS/android_packages_inputmethods_LatinIME/commits/lineage-21.0/dictionaries/pl_wordlist.combined.gz That's all the information I have. As far as I understand the flags are not used for anything, I don't even know whether they are in the compiled dictionaries.

Hmm... the point of reporting issue is improving the application itself :)

If you have unwanted words in the dictionary, changing the application might fix some particular issue, but will end up breaking a lot of other cases.

Helium314 avatar Aug 25 '24 19:08 Helium314

The whole suggestion score (and thus order) thing is a bit of a mystery to me. It's all happening in the c++ part, which I neven even looked at (and I don't plan on having a look).

The things under https://github.com/Helium314/HeliBoard/tree/main/app/src/main/jni? (auch, C++ :/ )

(I do wonder if the code being in native C does gain much performance boost from such implementation…)

That's all the information I have. As far as I understand the flags are not used for anything, I don't even know whether they are in the compiled dictionaries.

They are as I dug into the dictionaries here: https://github.com/Helium314/HeliBoard/issues/1043#issuecomment-2285706787

If you have unwanted words in the dictionary, changing the application might fix some particular issue, but will end up breaking a lot of other cases.

Hmm... that could be true though messing with dictionary (if re-published) could possibly break things for others as well.

woj-tek avatar Aug 26 '24 08:08 woj-tek

The whole suggestion score (and thus order) thing is a bit of a mystery to me. It's all happening in the c++ part, which I neven even looked at (and I don't plan on having a look).

The things under https://github.com/Helium314/HeliBoard/tree/main/app/src/main/jni? (auch, C++ :/ )

(I do wonder if the code being in native C does gain much performance boost from such implementation…)

I would assume it does, but potentially the bigger impact is memory consumption. The native code was written more than 10 years ago, so for significantly weaker phones, and probably for less optimized Java part than now.

I'd be fine if someone came along and offered a PR to replace this with something clearly better (with glide typing in mind or without changing the interface to Java). But this is likely a huge amout of work, see also #668.

That's all the information I have. As far as I understand the flags are not used for anything, I don't even know whether they are in the compiled dictionaries.

They are as I dug into the dictionaries here: #1043 (comment)

I don't understand. In this post are excerpts from word lists, but not whether flags are actually used when compiling. Or did you check some other way?

Helium314 avatar Aug 27 '24 19:08 Helium314

I would assume it does, but potentially the bigger impact is memory consumption. The native code was written more than 10 years ago, so for significantly weaker phones, and probably for less optimized Java part than now.

Hmm... quite possible but than again - HeliBoard is quite compact and efficient for today's standards IMHO. And I think one could write efficient Java without all that much pressure on memory (primitive data types/collections, etc) which should be still easier to work with than C/JNI…

Thanks for the link to the glide improvement issue. (though I'm not really particular about glide typing :) )

I don't understand. In this post are excerpts from word lists, but not whether flags are actually used when compiling. Or did you check some other way?

No, I haven't checked that, just the wordlist. I tired to figure out how dicttool works and went into rabbit hole trying to get sources (decompile the jar - no mention of "abbreviation" and then attemp to actually find the sources and it turns out it's part of AOSP and getting AOSP sources is funny and then it looks like it's just a make goal (https://github.com/aosp-mirror/platform_build/blob/main/core/ninja_config.mk#L26) and got stuck :)

woj-tek avatar Aug 29 '24 09:08 woj-tek

And I think one could write efficient Java without all that much pressure on memory (primitive data types/collections, etc) which should be still easier to work with than C/JNI…

Probably it's possible to get something very useable, at leasts on reasonably modern devices. But it might be a little bit of work...

then attemp to actually find the sources and it turns out it's part of AOSP and getting AOSP sources is funny and then it looks like it's just a make goal (https://github.com/aosp-mirror/platform_build/blob/main/core/ninja_config.mk#L26) and got stuck :)

I remember also trying this (or something similar) without success... It would be a nice extra if the keyboard was able to compile its own dictionaries, so users can simply provide a word list.

Helium314 avatar Aug 30 '24 19:08 Helium314

Probably it's possible to get something very useable, at leasts on reasonably modern devices. But it might be a little bit of work...

I wonder if recent improvements to Java (records, value classes, etc) could one day reach Android runtime making it more efficient…

I remember also trying this (or something similar) without success... It would be a nice extra if the keyboard was able to compile its own dictionaries, so users can simply provide a word list.

Hmm... given the converter is (should be in AOSP) thus it should be open-source so would it be possible/OK (licence-wise) to extract it and include in HeliBoard?

woj-tek avatar Sep 09 '24 07:09 woj-tek