mammoth.js icon indicating copy to clipboard operation
mammoth.js copied to clipboard

No warning about charts

Open MCTaylor17 opened this issue 7 years ago • 7 comments

Mammoth produces a warning for some unrecognized content, for example, an equation object:

An unrecognised element was ignored: {http://schemas.openxmlformats.org/officeDocument/2006/math}oMathPara

It doesn't however produce an error when a chart is embedded and fails silently.

Attached is a sample document with both objects embedded. Only one warning should appear when converted:

test-document.docx

MCTaylor17 avatar Feb 01 '18 23:02 MCTaylor17

This is likely a result of the way images are handled. Specifically, when a wp:inline element is found, the parser tries to find any images, and ignores everything else.

mwilliamson avatar Feb 02 '18 20:02 mwilliamson

I'm sorry, I don't follow. The chart isn't embedded as an image.

MCTaylor17 avatar Feb 02 '18 20:02 MCTaylor17

In order to handle images, whenever Mammoth finds a wp:inline element, it tries to find any images, and ignores everything else, including any charts (which, as you say, aren't images), without any warnings.

mwilliamson avatar Feb 02 '18 20:02 mwilliamson

Is it possible a warning could be introduced.

For some context, in our workflow, the person converting a documents isn't always familiar with the contents of the document. It would be useful for them to know if anything is ignored so it can be dealt with manually.

MCTaylor17 avatar Feb 02 '18 20:02 MCTaylor17

It's certainly possible, but I've no idea when I'd have time to take a look.

mwilliamson avatar Feb 02 '18 20:02 mwilliamson

Do you know which file(s) would need to be modified, or would you add a new file to handle this?

MCTaylor17 avatar Feb 02 '18 20:02 MCTaylor17

The function readDrawingElement() in body-reader.js needs changing. The main work is in working out which children of those elements should actually be ignored (such as properties elements) and which should emit warnings (such as charts). We want to avoid emitting warnings for the case that's already handled i.e. images. Actually changing the code should be reasonably straightforward.

mwilliamson avatar Feb 02 '18 20:02 mwilliamson