pdf-to-markdown icon indicating copy to clipboard operation
pdf-to-markdown copied to clipboard

Is it possible to detect highlighted sections (annotations) on a pdf and preserve that in md?

Open shrvenkataraman opened this issue 5 years ago • 3 comments

shrvenkataraman avatar Jun 06 '20 08:06 shrvenkataraman

So you want the whole PDF content and the highlights somehow marked Eg. as code or italic !? Or you want rather to just extract the highlights ?

Interesting feature anyway, but haven't looked into it so far..

jzillmann avatar Aug 05 '20 07:08 jzillmann

Hi, I don't know which issues @shrvenkataraman means. I have just tested you converter. I do have some issues that go in the same direction: As far as fist tests show here that highlighting (e.g. bold via ** ... **) is broken at the end of a PDF line especially in lists, this generates outputs of this kind (added to show carriage returns of the PDF and md output):

**- element_1_text_1**<cr>
  **element_1_text_2<cr>
- element_2_text_1**<cr>
  **element_2_text2<cr>
- last_element_text_1**<cr>
  **last_element_text_2**

Could you change this to something like this, please:

- **element_1_text_1<cr>
  element_1_text_2<cr>
- **element_2_text_1**<cr>
  element_2_text2**<cr>
- **last_element_text_1<cr>
  last_element_text_2**

I do hope this is one of the cases that he may have meant, too.

Thank you so much for your converter! Cheers Torsten

sslHello avatar Jun 19 '22 11:06 sslHello

image I guess @shrvenkataraman is talking about something like this

darkcheftar avatar Jun 19 '22 11:06 darkcheftar