pdf-tools icon indicating copy to clipboard operation
pdf-tools copied to clipboard

Add keyboard annotation command

Open dalanicolai opened this issue 4 years ago • 3 comments

This PR adds a keyboard annotation command, to add markup annotations using only the keyboard. A description for its usage is given in its docstring. No problem if you do not like to merge this, but I guess some people would like it.

dalanicolai avatar Jun 13 '21 09:06 dalanicolai

Hi @dalanicolai,

I went through the code you have attached. It cleverly sets up highlighting a region using the already implemented search functionality. I'd love to have this in pdf-tools.

However, I want to see this functionality implemented on the back of the default set-mark-command (C-SPC) to select regions.

I imagine the work flow would be exactly as it is in any normal Emacs buffer:

  • Search for a word (C-s, already works in pdf-tools)
  • Start marking a region (C-SPC, does not work in pdf-tools). Mark the desired region with C-n, C-f, C-p, C-b commands.
  • Once we have an active region, use existing annotation keybindings to create the necessary annotation.

I will leave this PR open for folks who will find this patch useful, or for anyone to try and implement this workflow above.

vedang avatar Jun 20 '21 17:06 vedang

@vedang Your idea sounds nice, but I guess it will be quite cumbersome to implement this with only the current set of query functions available. As far as I know the current set only offers the possibility to either obtain all text-regions on a page in full-text blocks, or otherwise obtain a single region by searching for some regexp/string match. However, poppler almost certainly provides functionality to return text regions structured by characters/words/lines etc. So if you would like to implement it like you propose, then I would suggest extending the query options in the epdfinfo server. When you do so, then you could also extend the annotation options, because poppler also provides arrow and free-text annotations.

I have implemented such annotation functionality for pdf-tools in pymupdf-mode. Pymupdf is another option that can be used for retrieving text-regions structured by characters/words/lines. pymupdf-mode 'communicates' with pymupdf via an interactive repl, which makes this mode really slow (it was just an experiment, and then of course I prefered an interactive implementation). However, in the meantime I have discovered that there exists also emacs-epc, I expect implementing pymupdf-mode using epc would be much faster. I would argue that it does not really matter if you obtain info about the text-regions via pymupf or via epdfinfo server. Of course the epdfinfo server will be slightly faster and 'native', but pymupdf would be more 'hackable' and probably more than fast enough (mupdf itself is considerably faster than poppler btw, so maybe even using it via python would be faster than using poppler directly).

Now that I wrote this, I actually realize that you can also use mutool to extract 'structured-text' from a pdf, but it only returns structured-xml structured by char (e.g. mutool draw -F stext filepath pagenumber).

Btw, just thinking with you...

dalanicolai avatar Jun 20 '21 19:06 dalanicolai

Although I like the idea of what you are suggesting, after thinking a little more I would say it is more cumbersome than my current implementation. I already have a command to highlight a single word by typing it (or part of it). And as far as I understand, you intend to set the start position by searching for a word, which is what I do now also. But then you would like to expand the region with those keys, while I simply ask to type a second pattern to put the end mark of the region. So I think in practice, the current implementation is simpler and faster. Instead of using the existing keys to finally create the annotation, here the annotation is created automatically, where I have a customizable default annotation, and otherwise you can prefix the command with a universal argument to select another other annotation style. Did you try out the current implementation?

B.t.w. if the pdf would get rendered using librsvg, then the pdf-avy-highlight would work fast also. Which then might be the most convenient implementation.

dalanicolai avatar Jun 20 '21 20:06 dalanicolai