docx2tex icon indicating copy to clipboard operation
docx2tex copied to clipboard

Handling of embedded .emf files

Open zopyx opened this issue 8 years ago • 9 comments

We have DOCX files where the authors often embed Powerpoint files. This case is not handler properly.

! LaTeX Error: Unknown graphics extension: .emf.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                              

l.429 ...16t125157.docx.tmp/word/media/image1.emf}

? x

Ideally .emf files would converted to proper SVGs or PNGs. If this is not possible they should be removed and not carried forward the LaTeX output Perhaps removed image could be replace with a placeholder or a warning message.

zopyx avatar Apr 05 '16 13:04 zopyx

We don’t handle that yet. I already asked @mkraetke to add an HTML report output for docx2tex that contains the messages that emerge from docx2hub. These types of embeddings should be reported there (and removed or replaced with a dummy, as you suggested).

gimsieke avatar Apr 05 '16 13:04 gimsieke

I would consider this as an enhancement as docx2tex is not intended to convert images. I would suggest that an image processing is done outside of docx2tex. An XProc wrapper for ImageMagick or libwmf would be a bad solution since these tools are not capable of handling clippings, borders etc properly. In this sense I would add htmlreports accompanied with Schematron rules in a later release.

mkraetke avatar Apr 13 '16 18:04 mkraetke

I think C-REX uses Inkscape for the EMF conversions to SVG with PNG as fallback..doing a reasonably good job.

zopyx avatar Apr 13 '16 18:04 zopyx

Convert images using visio

connor: [email protected]

import os 
import sys
import win32com.client

from os.path import abspath


visio = win32com.client.Dispatch("Visio.InvisibleApp")

folder=abspath(sys.argv[1])

for oldfilename in os.listdir(folder):

    if oldfilename.endswith(".emf"):
        f=abspath(folder+'\\'+oldfilename)
        doc = visio.Documents.Open(f)
        visio.ActivePage.ResizeToFitContents()#Set the border size according to the content
        doc.ExportAsFixedFormat(1, '{}.pdf'.format(f), 0,0,0,0,False,False,False,False)#Remove the black border
        visio.ActiveDocument.Saved=True
        visio.ActiveDocument.Close()
visio.Quit()

pbpf avatar Nov 13 '19 00:11 pbpf

@pbpf In the environment that transpect typically runs in, there’s no Visio (or any other Microsoft software) installed.

gimsieke avatar Nov 13 '19 07:11 gimsieke

We have been using unoconv (on debian systems) for some time with good results on .emf and .wmf. E.g.: unoconv -f pdf -o x.pdf x.emf However, IMHO, this is not something that docx2tex should be concerned with.

gamboz avatar Apr 17 '20 09:04 gamboz

Thank you for this suggestion. I had rather frustrating results with ImageMagick. However, I'm looking for an EMF converter which is based on Java in order to wrap it as XProc extension step so docx2tex does not rely on pre-installed software (besides Java of course).

mkraetke avatar Apr 17 '20 10:04 mkraetke

inkscape 1.01 works, can we use it?

inkscape tmp.emf -o tmp.pdf

pbpf avatar Oct 25 '20 16:10 pbpf

Thanks you for the suggestion. Unfortunately, we don't know the install path and it may vary between operating systems. I would rather stick to a Java library which is capable of converting EMF properly. However, I'll have a look how inkscape handles EMF files.

mkraetke avatar Oct 26 '20 07:10 mkraetke