giellakbd-ios icon indicating copy to clipboard operation
giellakbd-ios copied to clipboard

Incomplete word symbol

Open snomos opened this issue 3 years ago • 5 comments

For various word part prediction / completion approaches, we need a way to tell the keyboard that word form suggestion X is not a complete word (yet), and thus should not be followed by a space character.

The actual prediction system will probably vary (fst-based, machine learning based, something else?), but the point here is that in certain cases, the suggestions given by the speller are NOT complete words, just fragments of words. The basic idea is that for languages with complex morphology, it will help users write if we can suggest parts of words at natural break points, and that when one part is selected, the system will suggest the next part. This means that the suggested parts are not real or full words, and thus should not be followed by a space character when selected.

The character should not be visible, it is just a hint to the underlying system whether a space character should be added or not for selected suggestions. When an incomplete word suggestion is selected, the full input string up until the end of the selected suggestion should be used to create new suggestions.

There is presently an alpha version of a system like this for Plains Cree (in the Divvun Dev Keyboard app) , when using the circumflex SRO keyboard layout («nêhiyawêwin», note circumflexes, not macrons). Getting a space character for every selected continuation is not fun.

To test it, try input as follows:

user input suggestions explanation
nikî nikî-~, nîki nîki is a complete word; nikî-~ is an incomplete word
nikî-wâp nikî-wâpam~, nikî-wâpamâw nikî-wâpam~ is an incomplete word; nikî-wâpamâw is a complete word

~ used to visually mark a word fragment as incomplete.

Whether or not to use a visual marker for incompleteness probably has to be language specific. It is probably not needed for Plains Cree, as the hyphen will give enough feedback. For other languages it will probably be needed.

Further discussions and examples can be found in https://github.com/giellalt/keyboard-crk/issues/14 and linked issues.

@Eijebong @bbqsrc @aarppe

snomos avatar Sep 22 '22 07:09 snomos

If the marker of an incomplete word is part of the orthography, to indicate word-interal structure, such as the hyphen -, then it must remain when a suggestion of such an incomplete word is selected, and output, but without a trailing space.

However, if the marker of an incomplete is not part of the orthography, such as the tilde ~, then it should not be output when selected, and neither should such a selection be followed with a space.

The current crk alpha version actually deletes all incomplete word markers (~) from the input, if such a marker might have ended in the input as a result of selecting such a suggested incomplete word with such a marker. The removal of such a marker could just as well be done on the code side.

aarppe avatar Sep 22 '22 09:09 aarppe

I think it will make the system technically simpler, and thus easier to maintain, if we separate what is shown to the user and what is used internally. Only one symbol internally, for all languages. What is shown to users is language specific, as suggested above.

snomos avatar Sep 22 '22 10:09 snomos

... we separate what is shown to the user and what is used internally.

This would be fine in my opinion. But then the sign that a language's orthography uses to indicate morpheme boundaries (and thus incomplete words), e.g. hyphen - in the case of Plains Cree, needs to be kept functionally separate about what the marker character is for indicating such incompleteness, i.e. such a character needs to be both shown to the user and be included in the appropriate suggestion (if using the hyphen as such an internal marker). In this sense, using a character such as tilde ~ to indicate incompleteness can be interpreted in whichever way to show the incompleteness of a suggestion (whether as a tilde or some other visual form), as well as then be discarded from the actual suggestion that the user choose to continue with, but the hyphen as the orthography-internal marker would still need to be output and included in such suggestions.

aarppe avatar Sep 22 '22 19:09 aarppe

I started to experiment with this a bit, c.f. hfst-ospell-predict and the analyser branch of divvunspell have the switch -C for the incomplete word symbol and divvunspell I wrote it so that Suggestion datatype has an extra bool field for the finishedness of the word maybe that will be usable downstream for the UI.

flammie avatar Oct 07 '22 12:10 flammie

Another aspect to the table above is that sometimes a string is both complete and incomplete, e.g. nikî-wâpamâw would be the complete form with the translation 's/he saw him/her', but at the same time also the incomplete portion of nikî-wâpamâwak, translated as 's/he saw them'.

aarppe avatar Oct 10 '23 07:10 aarppe