vue-pdf-embed icon indicating copy to clipboard operation
vue-pdf-embed copied to clipboard

[Feature] Advanced Highlighting

Open valh1996 opened this issue 3 years ago • 8 comments

Hi,

I want to do extensive highlighting on the text of my PDF and not all the features are supported by pdfjs-dist.

I would like to add as an option the possibility to have a search with :

  • diacritics insensitive (unsupported), case insensitive (supported), entireWord search (supported)
  • several words to search, e.g. "alex" and "alice (unsupported)

The first solution I propose would be to add PdfFindController to access it via the $ref please? Then I just have to fork the library to add the desired functionality.

The other proposal would be to possibly expose an event (like beforeRender) that would allow us to easily alter the text of the rendered PDF? But I am not sure that this second option is possible.

Can you help me with this feature because i really need it and your package seems to be the "best" available right now please?

valh1996 avatar Jul 14 '22 12:07 valh1996

Maybe we can take inspiration from this:

  • https://github.com/AaronMorais/pdf-highlighter/blob/master/src/pages/index.js#L184-L211
  • https://github.com/EvolutionJobs/pdf-viewer/blob/master/pdf-viewer-page.ts

valh1996 avatar Jul 14 '22 14:07 valh1996

Hi @valh1996,

Do you think updating PDFJS to 2.13.216 would cover the diacritics insensitive search problem (check this PR)? As for "several words to search", I'm not sure how this is supposed to work.

hrynko avatar Jul 17 '22 14:07 hrynko

Hi @valh1996,

Do you think updating PDFJS to 2.13.216 would cover the diacritics insensitive search problem (check this PR)? As for "several words to search", I'm not sure how this is supposed to work.

Hi @hrynko,

Yes thanks, I think it would be perfect for the diacritics. However, how to access the pdfFindController to execute a find with your package?

I think it should be possible to do it via the ref making pdfFindController public:

pdfEmbedRef.pdfFindController.executeCommand('find', {
  caseSensitive: false,
  findPrevious: undefined,
  highlightAll: true,
  phraseSearch: false,
  query: query
});

As for the "several words to search", the problem is that if I run the above find with the word "Alice", and then re-run it with the word "Alex", then it will overwrite all previous matches with the word "Alice".

But for that, I'm not sure if you can do something at your level. I could for example simply modify this part of the PDF-JS lib with patch-package for exemple.

Therefore, could you please do an update to make access to the find command? And when a new version with the changes on master will be available (for PDFJS > 2.13.216) ?

EDIT : It looks like we have to go through the eventBus now to highlight the text instead of executeCommand. I don't know if you have an example to highlight in this case?

valh1996 avatar Jul 18 '22 08:07 valh1996

I've tried exposing something like the following, with no success so far:

import { EventBus, PDFFindController } from 'pdfjs-dist/legacy/web/pdf_viewer.js'
...
const findController = new PDFFindController({
  eventBus: new EventBus(),
  linkService: this.linkService,
})
findController.setDocument(this.document)

I'm not sure if this will work outside of PDFViewer yet, but if you could continue this experiment, I would appreciate a PR.

hrynko avatar Jul 20 '22 12:07 hrynko

I've tried exposing something like the following, with no success so far:

import { EventBus, PDFFindController } from 'pdfjs-dist/legacy/web/pdf_viewer.js'
...
const findController = new PDFFindController({
  eventBus: new EventBus(),
  linkService: this.linkService,
})
findController.setDocument(this.document)

I'm not sure if this will work outside of PDFViewer yet, but if you could continue this experiment, I would appreciate a PR.

Yes that's what I tried too, but as you say it can't work without a viewer I think. I tried to ask the question, but it seems that in this case we have to initialize everything manually...

Wouldn't it be easier to refactor using the simple viewer?

What are the advantages of having created the component outside the viewer?

valh1996 avatar Jul 22 '22 17:07 valh1996

What are the advantages of having created the component outside the viewer?

I expected that not using the viewer component could be more flexible and predictable, although it would have some limitations.

Wouldn't it be easier to refactor using the simple viewer?

It might be, but it would require additional refactoring of the component and could lead to unexpected side effects. So if it can be done without using a viewer, I would do it like this. Otherwise, I would postpone it until the next minor release.

hrynko avatar Jul 23 '22 09:07 hrynko

I tried an alternative solution to make the highlight when rendering the textlayer since we get the text with the exact position. But we lose all the advantages of the PDFFindController.

So, after trying several things, I can't get a conclusive result if you can help me on this please?

valh1996 avatar Jul 26 '22 09:07 valh1996

I'm having issues updating PDFJS, so I'd like to resolve them first. Will have another look at the highlighting issue afterwards.

hrynko avatar Jul 27 '22 09:07 hrynko