obsidian-annotator icon indicating copy to clipboard operation
obsidian-annotator copied to clipboard

annotations are gibberish

Open Smitty010 opened this issue 2 years ago • 2 comments

I was trying to annotate a pdf (https://research-repository.griffith.edu.au/bitstream/handle/10072/367173/Milanovic_2012_02Thesis.pdf?sequence=2&isAllowed=y). Unfortunately, what I end up with is gibberish image

Please note that when I view the pdf in Adobe Acrobat Reader, select some text, copy it to the clipboard and then paste it into a text file, I also get gibberish. I think it may have something to do with the fonts being used in the pdf and somehow the pdf reader not properly taking them into account. But that is a guess. So, maybe it's a hypothesis problem??

It could also be that it's encrypted somehow. But I did check the security properties, and it doesn't look any different than other PDFs security properties that I can copy excerpts.

Obviously, this doesn't happen with most pdfs, but it would be nice if it worked for this pdf :).

Smitty010 avatar Jul 07 '22 22:07 Smitty010

Not sure if you got an answer for this; but I just saw your message so I thought of pitching in - This generally happens to OCR errors, but in your case the creator (or publisher) of the PDF has missed out crucial information from the PDF that allows you to copy text properly.

One of the methods to recover copyable text is to run it through an OCR package (Adobe Acrobat Pro, or I used OCRmyPDF). I did try it on your document and, while the text is copyable now, the paragraphing is lost.

ShafeeqHamza avatar Aug 18 '22 14:08 ShafeeqHamza

Thanks for that. I may try it.

Cheers

Scott

On Thu, Aug 18, 2022 at 8:26 AM ShafeeqHamza @.***> wrote:

Not sure if you got an answer for this; but I just saw your message so I thought of pitching in - This generally happens to OCR errors, but in your case the creator (or publisher) of the PDF has missed out crucial information from the PDF that allows you to copy text properly.

One of the methods to recover copyable text is to run it through an OCR package (Adobe Acrobat Pro, or I used OCRmyPDF). I did try it on your document and, while the text is copyable now, the paragraphing is lost.

— Reply to this email directly, view it on GitHub https://github.com/elias-sundqvist/obsidian-annotator/issues/191#issuecomment-1219563521, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4V7S5BHMTBQSIT4GKXWDTVZZBZ3ANCNFSM5267B7JQ . You are receiving this because you authored the thread.Message ID: @.***>

Smitty010 avatar Aug 18 '22 16:08 Smitty010