pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Docx+Citations import fails with multiple sources (Endnote)

Open frederik opened this issue 3 years ago • 5 comments

Explain the problem. When importing a docx that has multiple sources combined in one references pandoc -s test.docx -f docx+citations -o test.json fails with

Invalid XML:
Missing root element

I am attaching a docx for reproduction with 1: multiple sources combined and then a single one. As far as I can see at first glance, the multiple sources are contained in the fldData (base64 encoded) while the single source is encoded inside the instrText.

I have reached out to the publisher to find out the exact Endnote Citation Plugin version that was used to create the document. (edit: EndNote X7.8 (Bld 11583))

combined.docx

Pandoc version?

pandoc 2.19.2 (installed with brew on MacOS (ARM))
Compiled with pandoc-types 1.22.2.1, texmath 0.12.5.2, skylighting 0.13,
citeproc 0.8.0.1, ipynb 0.2, hslua 2.2.1
Scripting engine: Lua 5.4

frederik avatar Nov 10 '22 09:11 frederik

It's a strange format here; the instrText and the data aren't even in the same node:

      <w:r w:rsidR="008138BA">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:fldChar w:fldCharType="begin">
          <w:fldData xml:space="preserve">
...base64data...
</w:fldData>
        </w:fldChar>
      </w:r>
      <w:r w:rsidR="008138BA">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:instrText xml:space="preserve">
 ADDIN EN.CITE.DATA 
</w:instrText>
      </w:r>

And there are several of these pairs in a row.

jgm avatar Nov 10 '22 17:11 jgm

@jgm could we maybe activate the Zotero and the Endnote reference detection separately? IMHO the Endnote detection is de facto unusable because most documents will contain combined citations, and thus they all need the feature deactivated.

Zotero, however, works great, and I think it's one of the most valuable features added to the docx reader in the last years.

frederik avatar Apr 22 '24 08:04 frederik

Activating separately would only help if the same document contains both zotero and endnote citations. And that's not going to be common, is it?

Otherwise, I'd say: just use +citations for zotero and don't use it for endnote.

jgm avatar Apr 22 '24 15:04 jgm

Activating it separately would allow us to still use Zotero references and ignore documents with Endnote (of which most fail with an error). We will have to catch the error and then run the conversion again having citations turned off.

frederik avatar Apr 23 '24 15:04 frederik

Another possibility, perhaps, is that we could catch the error in pandoc and ignore such cases. Or issue a warning.

jgm avatar Apr 23 '24 16:04 jgm