docx4j icon indicating copy to clipboard operation
docx4j copied to clipboard

Crash because Word encoded inline image as wdp (JPEG-XR) image

Open jhrtl opened this issue 3 years ago • 0 comments

Don't let this mess up your weekend, but i'd definately like to report this.

When creating a sample document and copy-pasting an image from clipboard there, i ended up with a docx file containing a wdp image. The clipboard just contained raw pixels. Not sure why Word decided wdp would be better suited than jpg or png to encode the clipboard image. Perhaps it was about resolution as the original was rather large. Larger than it is right now, as Word re-scaled it afterwards after seeing my usage isn't that huge. I was surprised Windows and Word even know this format. Technically it's https://en.wikipedia.org/wiki/JPEG_XR

Loading up this docx file in the current docx4j snapshot will result in a

ERROR o.d.o.c.ContentTypeManager - No subclass found for /word/media/hdphoto1.wdp; defaulting to binary

As the message says, such a content type isn't defined inside org.docx4j.openpackaging.contenttype.ContentTypeManager. But this alone wouldn't be fatal. If not doing anything with it, docx4j will merrily open and re-save the document without corrupting it.

But when interacting with this image, for example by loading it into a BinaryPartAbstractImage, this will provoke a crash:

BinaryPartAbstractImage mediaPart = (BinaryPartAbstractImage) doc.getParts().get(new PartName("/word/media/hdphoto1.wdp"));

/**
 * Exception in thread "main" java.lang.ClassCastException:
 * org.docx4j.openpackaging.parts.WordprocessingML.BinaryPart cannot be cast to
 * org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage at
 * bugrepro.WorkWithWDPImage.main(WorkWithWDPImage.java:29)
 */

I attached a ready to run example. reproduction.zip Should you prefer to code it yourself, i also attached the pure document. wdpImage.docx

jhrtl avatar May 07 '21 15:05 jhrtl