koreader Koreader skips first letter when highlighting

*KOReader version: 202404000209 *Device: Kobo Libra 2

When highlighting with a first word being just a single letter koreader skips it.

It doesn't happen every time but more often than not.

When bug occurs: https://github.com/koreader/koreader/assets/123873769/c6c31db9-3134-403d-b249-9a5f325303eb

Rare scenarios when the bug doesn't occur: https://github.com/koreader/koreader/assets/123873769/31b46ee6-0581-459d-a30e-d89e06f38cfe

Steps to reproduce:

Set Taps and gestures > Long-press on text > Highlight
Start to highlight beginning from a word being a single letter
Highlight the rest of the text fast
Enjoy the bug

crash.log

Jul 03 '24 16:07 polo2137

Confirmed. On long-pressing on the first (single-letter) word it becomes highlighted, then once the finger moves to the next word, the first word goes unhighlighted and the highlight starts from the second word.

Jul 03 '24 17:07 hius07

It also happens with two letter words but its rarer and very seldom with three letter words.

Jul 03 '24 18:07 polo2137

I think the reason is that getTextFromPositions works differently depending on whether pos0 == pos1. https://github.com/koreader/koreader-base/blob/5c57c2fa102ed6018509ab0e40635f7e0c2af4ce/cre.cpp#L2045 The first single-letter word sometimes is dropped when pos0 ~= pos1 (trying to handle punctuation I assume). Pinging @poire-z

Jul 05 '24 09:07 hius07

So when I long-press on a single-letter word and do not move the finger, it is called with the same pos0 and pos1, and the single-letter word is highlighted. https://github.com/koreader/koreader/blob/4f75e4163626314bc4fc1b3193b7a5b6f026243d/frontend/apps/reader/modules/readerhighlight.lua#L1331 Then when I pan the selection, it is called with different start / end positions and the first word is dropped. https://github.com/koreader/koreader/blob/4f75e4163626314bc4fc1b3193b7a5b6f026243d/frontend/apps/reader/modules/readerhighlight.lua#L1510

Jul 05 '24 10:07 hius07

Can't look at the code for the moment, but I think this was already discussed (may be about CJK and selecting a single char/word).

I think when you long-press to select a single word/letter, you're inside a glyph, and we extend on both sides. When moving around, the code may use x more precisely, expecting it to span at least half the glyph width to include the glyph in the selection. Or something like that. May be or may not be on purpose, don't remember - and/or tedious to fix.

Jul 05 '24 21:07 poire-z

I'm also interested in this, as it happens to me frequently.

Jul 09 '24 07:07 AaronKnowls

@poire-z So there will be no fix attempt?

Jul 10 '24 00:07 polo2137

So there will be no fix attempt?

I didn't mean that - and neither the opposite :)

I may look at it (again) when I have the time. I think some explanation for the current behaviour (annoying may be, but simpler to handle) - and some attempt at fixing it for CJK chars (where the issue was a lot more serious) can be read at: https://github.com/koreader/koreader-base/blob/5c57c2fa102ed6018509ab0e40635f7e0c2af4ce/cre.cpp#L2094-L2132

Originally discussed around https://github.com/koreader/koreader/issues/8619#issuecomment-1012506110

Jul 11 '24 21:07 poire-z

I noticed that you can avoid this bug most of the time if you start highlighting from the very left edge of the first letter:

https://github.com/user-attachments/assets/5caabd78-4965-45d9-b355-f683da02e701

Jul 14 '24 17:07 jonnyl2

@poire-z As a disclaimer I'm not a programmer and can't even read the code but I have this idea. Would it be possible to resolve this issue by making koreader check graphically where the edge of selected area became selecting shade and then start highlight from this point? If lua allows tracking of individual pixels this seems like a simple solution.

Jul 15 '24 09:07 polo2137

Nope, that is not how all this works - nor could work. It works with char indices in text nodes - reusing complicated stuff used by other things. Not going to explain more as I would need to dig again into all that, no time at the moment. Leave with the trick mentionned by @jonnyl2 : start a bit on the left side of your starting word, your finger needs to be in the 50%-left side of the glyph (ok, hard when highlight I :), or even on the space on the left (I guess that's how we all do it, inconciously may be, you may soon get used to do that.)

Jul 15 '24 10:07 poire-z

I've been using the 'trick' above and always start selecting text from the empty space prior to the sentence I want to highlight. Now this happened today – the selection started with the ending period of the previous sentence:

Since the intended behavior is that the pan-selection only captures entire words, did I run into a different kind of bug here?

Aug 14 '24 12:08 jonnyl2

What does show HTML look like?

Aug 14 '24 14:08 Frenzie

This is the entire paragraph:

Aug 14 '24 14:08 jonnyl2

always start selecting text from the empty space prior to the sentence I want to highlight. Now this happened today – the selection started with the ending period of the previous sentence:

May be you actually started on the few pixels from the thin period width?

Since the intended behavior is that the pan-selection only captures entire words

Seems only when you happen to be inside a word https://github.com/koreader/koreader-base/blob/5c57c2fa102ed6018509ab0e40635f7e0c2af4ce/cre.cpp#L2082-L2090

Aug 14 '24 21:08 poire-z

https://github.com/koreader/koreader-base/blob/5c57c2fa102ed6018509ab0e40635f7e0c2af4ce/cre.cpp#L2082-L2090

// When panning, only extend to include the whole word when we are // actually holding inside a word. This allows selecting punctuation, // quotes or parens at start or end of selection to have them // included in the highlight.

Couldn't this rule be refined a bit as it relates to the beginning of selections? So that at least punctuation marks not preceded by a space are excluded? --edit: or I guess more accurately, punctuation marks that are immediately preceded by a letter.

Aug 15 '24 06:08 jonnyl2

I'd rather try fixing the original problem - than having to think about edge cases with the workaround :)

Aug 17 '24 22:08 poire-z

Why not both? :) Even if a fix is found for the original issue, small text and/or fat fingers could still lead to inadvertant selection starting from the punctuation of the previous sentence. Especially if the first word is really small (e.g., "I" or "It") and the space between words (and sentences) is very narrow, as is often the case depending on font properties and justification settings, etc. So this wouldn't just help with this workaround but would make text selections more robust in general.

Aug 18 '24 05:08 jonnyl2

(Ignoring previous comment for the moment.)

Been trying to fix the original issue, by making the CJK single char tweak more generic, so it applies to both ends whether CJK or not (which would fix that original issue). It's been quite nightmarish.... My logic (after many many tries...) would be going like that:

bool not_panning = r.getStart() == r.getEnd();
bool swapped = r.getStart().compare(startp) < 0;

lvPoint refpt = swapped ? endpt : startpt;
bool is_start_inside = adjustXPointer( tv, r.getStart(), refpt, true );
refpt = swapped ? startpt : endpt;
bool is_end_inside = adjustXPointer( tv, r.getEnd(), refpt, false );
// Added adjustXPointer( LVDocView *tv, ldomXPointerEx & xp, lvPoint refpt, bool towards_start )
// that is supposed to do what the CJK bit was doing.

bool to_word_start = false;
bool to_word_end = false;
bool is_start_word_char = r.getStart().isVisibleWordChar();
bool is_end_word_char = r.getEnd().isVisibleWordChar();
bool is_start_word_start = r.getStart().isVisibleWordStart();
bool is_end_word_end = r.getEnd().isVisibleWordEnd();
// XXX conditions to review...
if ( not_panning) {
    to_word_start = true;
    to_word_end = true;
}
else {
    if ( swapped ) {
        if ( is_end_inside )
            to_word_end = true;
        if ( is_start_word_char )
            to_word_start = true;
    }
    else { // When selecting text forward
        if ( is_start_inside )
            to_word_start = true;
        if ( is_end_word_char )
            to_word_end = true;
    }
}
if ( to_word_start )
    r.getStart().prevVisibleWordStart();
if ( to_word_end )
    r.getEnd().nextVisibleWordEnd();

The is_start/end_inside is supposed to detect when we are really holding inside a glyph or not. When we are not, it's not obvious to guess what the user wants.

Some sample places where we are not, marked with a pink mark in this sample (the one in the middle of the 2nd line is because of text justification and spacing added between words, and holding in this spacing may not be holding on a glyph, not even the regular space glyph):

The current behaviour when we are holding on the pink marks is a bit inconsistent, but it often triggers something odd, that nobody really ever complain about, and that I somehow quite like: it would highlight the word on the right on the previous line, and the word on the left on the next line (actually, that's not what the code is really doing: it highlights the previous/next word in the text node in logical order, which is what I described on regular normal text, but may not be when the layout is a bit more complicated, like in tables):

I think it's actually quite nice, as it shows where you are - and panning towards the north or south would keep only the word on the correct side, so it acts as a "these are both possible starts for your highlight, if it is alright, you decide now in what direction you want to go", and we'll keep only one of these "starts". The really odd thing is when you just want a dict lookup, and you'll get both words as the string to lookup (ie. une construction or titre A), which will usually fail. My above fixes currently (by chance :) I fear touching it any further) makes this behaviour more consistent.

So, I'm asking what people think about this "odd" behaviour? Is it something worth keeping and having working consistently? Or is it just too odd? The alternative would be to just not highlight anything, until the user moves and a direction is chosen. (I don't think I can easily, say if holding on the first pink marker in the left margin, select only "construction".)

Aug 22 '24 19:08 poire-z

The current behaviour when we are holding on the pink marks is a bit inconsistent, but it often triggers something odd, that nobody really ever complain about, and that I somehow quite like: it would highlight the word on the right on the previous line, and the word on the left on the next line

I don't observe this behavior on my device. It always only higlights one word: if holding in the margin, the word closest to it on the same line, and if holding in the space between two words, the word that follows. Or are you describing the behavior after your fixes?

EDIT: Sometimes it also selects the first word on the next line, or it indeed selects both the last word on the same line and the first word on the next line. Not sure what logic it follows.

So, I'm asking what people think about this "odd" behaviour? Is it something worth keeping and having working consistently? Or is it just too odd?

I've never really noticed this behavior (words being highlighted away from the touch area). But I also use L/R margins that are narrower than a finger's width and keep text left-aligned. So I would be unlikely to ever encounter this behavior during normal use.

However, the way you describe it doesn't seem unreasonable: highlight both adjacent words if starting the selection between two words, and unhighlight the one from the opposite direction when starting to pan in one direction. Wouldn't this also fix the original word-unhighight issue being raised here? (Again, I'm not sure if this is because that was your intention and you're describing the behavior after a potential fix.)

Aug 23 '24 06:08 jonnyl2

Or are you describing the behavior after your fixes?

It's more consistent after my "fixes" / "messing around". But you should notice it more easily with current koreader in these cases:

On the sides of a normal paragraph with latin text, current seems to indeed highlight the nearest word on the same line, which feels indeed nicer :/ But that behaviour is a bit more odd with CJK text, where sometimes it's the last CJK glyph on previous line that gets highlighted... Hard to figure out some logic that would do the right thing in all cases.. I guess I'll try again...

Wouldn't this also fix the original word-unhighight issue being raised here?

"word-unhighlight" ? Not sure I get what issue this is.

Aug 23 '24 06:08 poire-z

"word-unhighlight" ? Not sure I get what issue this is.

I meant that right now, a user may long-press on a word and watch it highlight, only to then see it being unhighlighted while panning to make the complete selection (apparently if that long-press started too close to the end of the word). The way I understood your description, only the word opposite from the panning direction would get unhighlighted during panning.

But perhaps this was only applicable if starting from an 'empty' area.

Aug 23 '24 07:08 jonnyl2

Ok, got it. Yes, strange things can happen when you long-press on the right half of the last glyph of a word: it works on the initial long-press as we grab on both sides (so, taking the full word on the left, and may be not the one on the right as we are stopped by a space), but when panning forward, you may get that word left out. My adjustXPointer(...) is supposed to solve that, adjusting the xpointer to the start of that glyph, in all cases (initial + panning). It solves the issues raised in this post, but causes other side effects (that we could consider good or bad, which was the object of my request for comments :))

Aug 23 '24 07:08 poire-z

I'd say if the only drawback is that single words could no longer be selected from a stationary long-press in the margin, that would be an acceptable compromise if it solves the original issue (also taking a wild guess that probably not many users do that in the first place, and therefore won't notice if it goes away/ selects two words now). But without testing it, it will be hard to give confident feedback. (As this concerns C++ code, I guess there's no easy way of testing it? Does it need a fully built package to test on a device?)

Aug 23 '24 12:08 jonnyl2

If you have a Kobo (?), I guess I could build and share a libkoreader-cre.so. (Haven't done that since the last build stuff big revamp, so not sure I'll manage.)

Aug 23 '24 13:08 poire-z

If you have a Kobo (?), I guess I could build and share a libkoreader-cre.so.

Yes, I do. Sure, if you want.

Aug 23 '24 13:08 jonnyl2

OK, managed to build it and it works on my Kobo over a few-days-old nightly. Backup and replace koreader/lib/libkoreader-cre.so with the one in libkoreader-cre-kobo.zip You can:

confirm if that idea would solve the issues with small words selection - and the "word-unhighlight" issue.
witness a bit more of the double-word selection when long-press in the margins or interspace (not sure I really like it tbh)
see if you notice any other bad behaviour changes

No need yet for detailed reports about specific cases.

(You'll get a bit more noise in your crash.log, I didn't remove my debug printf.)

Aug 23 '24 14:08 poire-z

Small word selections do work fine if starting the panning from the end of the word, but often it takes along the previous word if starting earlier. Also, it doesn't include the leading single quotation mark:

https://github.com/user-attachments/assets/eb68ab23-f775-49a1-80bf-c80666986d45

Here it starts highlighting from the last word of the previous paragraph, even when starting from the 'h' in "The":

https://github.com/user-attachments/assets/d0ab1459-7d12-449c-ac8d-35e5510c1665

Here I noticed something else odd. The word "nothing" gets broken up on the last selection and only partially highlighted:

https://github.com/user-attachments/assets/758bf6ab-b4c3-4bdd-9586-a5d213084330

The 'double word highlights' also happen when just long-pressing on any space between words; it doesn't have to be the extended spacing for justified text. So it would probably happen a lot. (It doesn't feel very intuitive to me, tbh.)

Aug 23 '24 16:08 jonnyl2

So, it feels worse than the current behaviour, right?

Thanks for the videos, it gives me other ways of highlighting to test with.

I can reproduce your first video, if I take some English book with the kind of quotes you have :) Also with some French book with quote+space. Some case I didn't test with...

I can reproduce the second video also. What you see happens when you start on the first word - but not when I start on the indentation before it.

About The word "nothing" gets broken up on the last selection and only partially highlighted, it indeed seems odd. You're a bit fast in extending the highlight, so I'm not sure how it started and would stay if you were steady. Also, to be completely, sure, can you View HTML around this word, to see if it is all in a same text node, of if per bad luck, there would be something like than nothi<span>ng.'</span></p>, which would give some explanation for that behaviour.

Anyway, lots of odd behaviours, that may hint that my idea was not too good, if it would need more code/hacks to explicitely counteract these side effects :/ Will think again about it, thanks for testing.

PS: something odd is happening to me on Github since some months ago: when there is some video, if I load the page and play the video (I don't have autoplay enabled) immediately, it works. If I load the page and wait a few minutes before playing the video, the player gives me "network error". Same with pictures: if I wait a bit to click on it and open the larger image in another tab, it fails; when I reload the page and do the same thing, I get the image. Anybody else getting this ? Or is it just my Firefox vs. Github ?

Aug 23 '24 18:08 poire-z

Here's another one: libkoreader-cre-kobo2.zip It uses !is_start_word_start instead of my true, which seems to solve both 2 first issues and restores more of the current behaviour. Can you test it again and give your general feeling ?

if ( not_panning) {
    to_word_start = !is_start_word_start;
    to_word_end = !is_end_word_end;
}
else {
    if ( swapped ) {
        if ( is_end_inside )
            to_word_end = !is_end_word_end;
        if ( is_start_word_char )
            to_word_start = !is_start_word_start;
    }
    else {
        if ( is_start_inside )
            to_word_start = !is_start_word_start;
        if ( is_end_word_char )
            to_word_end = !is_end_word_end;
    }
}

Unfortunatly, it doesn't work really well with Chinese text (where every glyph is a word, and no space are used), and I was happy with the true that made it work. But I'd rather re-add special handling of CJK (or the absence of spaces) if it feels really better with western text.

Aug 23 '24 18:08 poire-z