sumatrapdf icon indicating copy to clipboard operation
sumatrapdf copied to clipboard

Optimization for double click selection for numbers

Open playgithub opened this issue 2 years ago • 10 comments

Double click numbers as below should select all

  • 506,804,267.22
  • 33.94%

playgithub avatar Apr 24 '22 05:04 playgithub

It works as designed for different functions since it collects all letters and number characters or images ignoring anything else then avails that string or image to SyncTex or other associated application that needs bare minimal image or alpha-numeric input (without punctuation)

For example if that % were to be included in a web request it would foul the whole request likewise if commas or decimal dots were included in a $election.file it could wreak havoc.

GitHubRulesOK avatar Apr 24 '22 10:04 GitHubRulesOK

What about pattern matching by regex?

playgithub avatar Apr 24 '22 10:04 playgithub

@playgithub PDF is composited like metal type one letter at a time, words mean nothing and a space is just binary as hex \x20 if it is literally there (which it does not need to be) as each few letters can be kerned different ways, regex is discussed elsewhere as a need for lots of coding when its not just collating letter groups into a word. @kjk I must admit from looking at that code in the past I thought the letter groups ended on a concept of a terminal space, and unsure if any change would affect about 20-25% of users at some time dependent on historic LaTeX behavior, since it should be co-ordinate double click rather than string selection?

GitHubRulesOK avatar Apr 24 '22 11:04 GitHubRulesOK

PDF is composited like metal type one letter at a time

It can be one letter at a time, also can be serveral letters at a time, e.g.

5 0 obj

<< /Length 44 >>

stream

BT

/F1 24 Tf

100 100 Td (Hello World) Tj

ET

endstream

endobj

playgithub avatar Apr 24 '22 11:04 playgithub

trouble is in pdf there are thousands of methods here is another /Author(\376\377\000a\000n\000a\000l\000o\000r\000e\000n\000a)

GitHubRulesOK avatar Apr 24 '22 12:04 GitHubRulesOK

trouble is in pdf there are thousands of methods here is another /Author(\376\377\000a\000n\000a\000l\000o\000r\000e\000n\000a)

Yes, it can be arbitrary. However for grouped text, it can be optimized for selection.

playgithub avatar Apr 24 '22 12:04 playgithub

@kjk It appears other PDF viewers such as Acrobat and Chromiums like Edge will include . and , it is a distinct item

image

image

image

GitHubRulesOK avatar Jan 24 '24 13:01 GitHubRulesOK

I've only tested built-in pdf viewer in chrome and it doesn't work this way.

A double click selects a word which doesn't include .,% characters.

A triple click selects a line, which might look like selecting a word with those chars if that's the only thing in the line. To see the difference you would have to put multiple words on a line.

Chrome (the browser) on the other hand selects ., (but not %) but only if all other characters is numbers.

kjk avatar Jan 30 '24 14:01 kjk

Not fully correct improvement for numbers with .,:

static bool isNumeric(WCHAR c) {
    return c >= '0' && c <= '9';
}

static bool isNumberPart(WCHAR c) {
    return c == '.' || c == ',';
}

void TextSelection::SelectWordAt(int pageNo, double x, double y) {
    int i = FindClosestGlyph(this, pageNo, x, y);
    int textLen;
    const WCHAR* text = textCache->GetTextForPage(pageNo, &textLen);

    bool isDigitOnly = true;
    WCHAR c;
    for (; i > 0; i--) {
        c = text[i - 1];
        if (isWordChar(c)) {
            isDigitOnly &= isNumeric(c);
            continue;
        }
        if (isDigitOnly && isNumberPart(c)) {
            continue;
        }
        break;
    }

    StartAt(pageNo, i);

    for (; i < textLen; i++) {
        c = text[i];
        if (isWordChar(c)) {
            isDigitOnly &= isNumeric(c);
            continue;
        }
        if (isDigitOnly && isNumberPart(c)) {                
            continue;
        }
        break;
    }
    SelectUpTo(pageNo, i);
}

It doesn't work for ETT12.01 (when double-clicked on 01 part, it'll select ETT12. Need to fix the going backwards test.

kjk avatar Jan 30 '24 14:01 kjk

Ok edited my poor observation above but adding ,. into numeric values is matching Edge and Acrobat ? and 100% is not a whole number :-)

GitHubRulesOK avatar Jan 30 '24 15:01 GitHubRulesOK