zotfile icon indicating copy to clipboard operation
zotfile copied to clipboard

When extracting annotations with pdf.js non-ascii characters are replaced with other characters

Open Luftzig opened this issue 4 years ago • 1 comments

Zotero: 5.0.95.3 Zotfile: 5.0.16

What am I trying to do: I've added annotations in preview (macOS default document viewer) that included, for example the word "jūdō", and then I use zotfile's extract annotations feature.

Result: The word "jūdō" from the above example was extracted as: "JkdM". The whole faulty annotation:

ÿþSee JkdM, KendM, KyudM etc. (note on p.74)

Original:

See Jūdō, Kendō, Kyudō etc. 

What I expect: I expect the annotations to preserve the text, or alternatively, to allow me to select the proper encoding (probably UTF8 but different users might need other encodings).

Luftzig avatar Feb 16 '21 17:02 Luftzig

I have a similar problem and I provide an example that can be reproduced. THis problem only occurs from time to time when extracting annotations from a pdf with zotfile,

Here is a simple example of this behavior with the pdf file from this link: https://www.scielo.br/j/rbef/a/j8y7vZt69DpS5kKYZWyV5Yz/?format=pdf&lang=pt

If I annotate the title: "Tradução comentada de um clássico de Copérnico", I get the following extracted annotation:

"Traduc òao comentada de um cl ¥assico de Cop ¥ernico" (Dias 2004:195)

Is there is a way to correct this behavior with pdf files that behaves this way with zotfile annotation extraction?

I use zotero 5.0.96.3 on archlinux

manouchk38 avatar Feb 21 '22 19:02 manouchk38