sumatrapdf
sumatrapdf copied to clipboard
Optimization for double click selection for numbers
Double click numbers as below should select all
- 506,804,267.22
- 33.94%
It works as designed for different functions since it collects all letters and number characters or images ignoring anything else then avails that string or image to SyncTex or other associated application that needs bare minimal image or alpha-numeric input (without punctuation)
For example if that % were to be included in a web request it would foul the whole request likewise if commas or decimal dots were included in a $election.file it could wreak havoc.
What about pattern matching by regex?
@playgithub PDF is composited like metal type one letter at a time, words mean nothing and a space is just binary as hex \x20 if it is literally there (which it does not need to be) as each few letters can be kerned different ways, regex is discussed elsewhere as a need for lots of coding when its not just collating letter groups into a word. @kjk I must admit from looking at that code in the past I thought the letter groups ended on a concept of a terminal space, and unsure if any change would affect about 20-25% of users at some time dependent on historic LaTeX behavior, since it should be co-ordinate double click rather than string selection?
PDF is composited like metal type one letter at a time
It can be one letter at a time, also can be serveral letters at a time, e.g.
5 0 obj
<< /Length 44 >>
stream
BT
/F1 24 Tf
100 100 Td (Hello World) Tj
ET
endstream
endobj
trouble is in pdf there are thousands of methods here is another /Author(\376\377\000a\000n\000a\000l\000o\000r\000e\000n\000a)
trouble is in pdf there are thousands of methods here is another /Author(\376\377\000a\000n\000a\000l\000o\000r\000e\000n\000a)
Yes, it can be arbitrary. However for grouped text, it can be optimized for selection.
@kjk It appears other PDF viewers such as Acrobat and Chromiums like Edge will include .
and ,
it is a distinct item
I've only tested built-in pdf viewer in chrome and it doesn't work this way.
A double click selects a word which doesn't include .,%
characters.
A triple click selects a line, which might look like selecting a word with those chars if that's the only thing in the line. To see the difference you would have to put multiple words on a line.
Chrome (the browser) on the other hand selects .,
(but not %
) but only if all other characters is numbers.
Not fully correct improvement for numbers with .,
:
static bool isNumeric(WCHAR c) {
return c >= '0' && c <= '9';
}
static bool isNumberPart(WCHAR c) {
return c == '.' || c == ',';
}
void TextSelection::SelectWordAt(int pageNo, double x, double y) {
int i = FindClosestGlyph(this, pageNo, x, y);
int textLen;
const WCHAR* text = textCache->GetTextForPage(pageNo, &textLen);
bool isDigitOnly = true;
WCHAR c;
for (; i > 0; i--) {
c = text[i - 1];
if (isWordChar(c)) {
isDigitOnly &= isNumeric(c);
continue;
}
if (isDigitOnly && isNumberPart(c)) {
continue;
}
break;
}
StartAt(pageNo, i);
for (; i < textLen; i++) {
c = text[i];
if (isWordChar(c)) {
isDigitOnly &= isNumeric(c);
continue;
}
if (isDigitOnly && isNumberPart(c)) {
continue;
}
break;
}
SelectUpTo(pageNo, i);
}
It doesn't work for ETT12.01
(when double-clicked on 01
part, it'll select ETT12
. Need to fix the going backwards test.
Ok edited my poor observation above but adding ,.
into numeric values is matching Edge and Acrobat ? and 100% is not a whole number :-)