HeliBoard icon indicating copy to clipboard operation
HeliBoard copied to clipboard

Spellcheck all enabled scripts and improve URL detection

Open codokie opened this issue 1 year ago • 17 comments

I made some changes to the spellcheck so that it checks enabled locales other than the active one if they are of different scripts. If they are of different script, there is no concern that the spellcheck would validate typos in written other locales. If no locale could be determined, then the text consists of non-letters only (numbers, special characters, etc.) so it should not be spell checked.

The result is saved in the cache for quick retrieval since the spellcheck service may be called multiple times for the same word. This change should also make it seem like the spellchecker can "remember which language a word was typed" (#93) even though the TextInfo has no attribute for the locale.

If the experimental setting "URL detection" is enabled, URLs are checked using a regex pattern Matcher to prevent them from being flagged as typos. For some reason though, URLs containing '?', '!', or ';' are still truncated before they are sent to the spellchecker, so it marks the text after any of those symbols as a typo if its not a valid dictionary word.

codokie avatar Apr 01 '24 14:04 codokie

Are the changes to URL detection and spell check dependent on each other? If not, I would much prefer to have the changes split up into separate PRs.

Helium314 avatar Apr 03 '24 18:04 Helium314

Why actually check all enabled scripts? I remember somewhere you wrote that checking the whole text when adding a new word is already inefficient. Checking it in all scripts seems even less efficient than that. When it comes to performance, you should also be very careful with regexes.

Btw it might be good to define what is the wanted behavior, because from your description I don't think I understand it.

the URL detection changes are minimal and were made solely for the spellchecker

I did not check the code yet, but it looks like your are changing RichInputConnection and InputLogic which are not related to the spell checker.

Helium314 avatar Apr 03 '24 20:04 Helium314

@RHJihan could you please check whether this solves your old issue with the spellchecker #93?

Yes, it solves the problem.

RHJihan avatar Apr 28 '24 07:04 RHJihan

Could you make changes to spellcheck so that it doesn't check single letters and names after @?

For example, when replying to someone on Discord or GitHub, their nick will be underlined and this is a bit irritating. Single letters are also underlined for some reason. Even capital letters of those that are already in the dictionary, e.g. O or Z.

maruuk avatar May 23 '24 09:05 maruuk

Oh, that's great, thanks! I think spellchecker shouldn't underline email addresses either.

maruuk avatar May 24 '24 06:05 maruuk

I finally had a look at this PR, and was reminded why I had postponed this several times. Sorry for this story again, but it really bothers me. (you even wrote "URLs, numbers and special characters are checked against the dictionary as well. Unsurprisingly, they are then flagged as typos (red underlined). But that's for another issue..")

This should be at least 4 PRs:

  • check in all enabled scripts
  • improve URL detection
  • change cache usage (avoid underlining previously enabled languages)
  • at least one more PR for the other changes (like treatment of short words, or separating words on dash, which btw I didn't find mentioned anywhere, it just appears in the code)

Advantages:

  • less time I need to spend for review, because I only need to understand a single change instead of deciphering which change belongs where
  • faster review, because I can read a short PR in much less time than a long one (time I need to understand grows more than linear with length)
  • increased probablility I'm actually looking at PRs instead of thinking "not another one of those things, will check later"
  • I can relatively simply revert an individual change if it turns out to be breaking stuff / introducing bad behavior

Now for the actual code stuff: I think it has potential for issues (inconsistencies with cache and when things in other languages are underlined or not) but overall it's an improvement.

Some comments / questions in the code section.

I don't know how bad is the performance impact of regex pattern matching on low-end phones

Performance is fine.

Helium314 avatar Jul 23 '24 20:07 Helium314

@Helium314 thanks for the review, you've raised some good points.

Admittedly I'm somewhat lazy and don't like fixing merge conflicts so that's why it's all wound up in a single PR. Even if it's bad practice, I don't think it's uncommon to do a overhaul of a single feature (in this case, the spellchecker) in one PR. Especially if only about 100 lines of code were modified.

As you know from my other PRs I kind of lost interest with the project and have less time currently so I don't know when/if to rework this PR. Sorry about that..

If you would still like to merge this PR (you could also split it as you had suggested), I'll do my best to answer (hopefully not too many) questions here (within reasonable time).

codokie avatar Jul 24 '24 08:07 codokie

Even if it's bad practice, I don't think it's uncommon to do a overhaul of a single feature (in this case, the spellchecker) in one PR.

Such an overhaul can be done, but needs to be in a better discussed and documented way. Things like removing CHECKABILITY_TOO_MANY_NON_LETTERS is not in the linked issue, and not in the description of your PR. I wouldn't be surprised if this change is disliked by a bunch of users. Reviewing and understanding changes to code I have barely read once (which is the case for the spell checker) is quite time consuming, and even more so when you find undocumented extras and you have to decipher whether they are actually undocumented changes, or just related to something you don't yet understand. With my current understanding of the spell checker, I never would have agreed to a general overhaul in a single PR. And even with more knowledge, such things should be first discussed with other community members (personally I don't really care because I don't use the spell checker).

If you would still like to merge this PR

I will merge it with some smaller changes as mentioned above.

Helium314 avatar Aug 25 '24 15:08 Helium314

Things like removing CHECKABILITY_TOO_MANY_NON_LETTERS

What I suggested is also the behavior of some spellcheckers like the one present in Microsoft Word. No one seems to be upset that currently the spellchecker labels numbers as typos, so I highly doubt you will see someone complain if a mix of letters and numbers will be labeled as a typo.

The reason why your fork of OpenBoard interested me is because you added some neat features to it. It seems that (perhaps due to time constraints) you are no longer focused on renovating the keyboard anymore, only to maintain it. That's understandable, but you did not convince me that my ideas are not desirable by others.

I had spent some considerable time debugging the spellchecker and I think I have more than a basic understanding of how it works. I assure you that the person who originally added this functionality did not get it completely right (not trying to claim that I know much better.)

I won't be participating in this PR anymore because I do not think that your review is open-minded and you will likely heavily change things you do not really understand or able to test. So feel free to alter my undecipherable code as you see fit, or to close the PR if you feel offended (not exactly my intention)

codokie avatar Aug 26 '24 09:08 codokie

(My comment below is off-topic so I'm hiding it by default.)

Read my comment

It's so sad to read this comment, even though I can totally understand it. ☹️

@Helium314 I hope you'll find a compromise so that @codokie can work on this application again. His ideas need to be exploited, and creating a developer group may be a good idea so that you can freely discuss future improvements. I think 2, 3 or 4 people maximum to manage the application might be a good idea.

I hope this will be taken into consideration.

BlackyHawky avatar Aug 26 '24 09:08 BlackyHawky

@BlackyHawky

Thank you for your support, I really appreciate it. Sadly it's no use, @Helium314 isn't interested to have me as a contributor (https://github.com/Helium314/HeliBoard/issues/1019#issuecomment-2558573947).

@Helium314

I've now closed all my open PRs so you won't have to look at my mess any more, I hope you are satisfied..

codokie avatar Jan 23 '25 08:01 codokie

@codokie I know because I've read your conversations... It's sad news and I can understand how you feel. ☹️

The only thing I'm wondering is: why haven't you created your own variant of Heliboard? 🤔

Take care and maybe see you soon. 😉

BlackyHawky avatar Jan 23 '25 09:01 BlackyHawky

@BlackyHawky Thanks for being so supportive :handshake:

why haven't you created your own variant of Heliboard?

I guess this whole ordeal has left such a bitter taste in my mouth that continuing the legacy of this project by forking it doesn't feel like the right thing to do. It can also take quite a bit of time and resources to maintain a healthy repository.

Perhaps in the not far future I'll continue with app development, but I first need to brush up on my programming skills and learn new languages.

codokie avatar Jan 23 '25 10:01 codokie

@codokie Don't thank me, it's normal. In my eyes, you're a very good developer and I'm sure you could have brought a lot of positive things to Heliboard.

I wish you all the best with your projects. 👍

BlackyHawky avatar Jan 24 '25 00:01 BlackyHawky