mammoth.js icon indicating copy to clipboard operation
mammoth.js copied to clipboard

Access color and background/highlight color through transformDocument

Open walling opened this issue 4 years ago • 7 comments

I have a document with a table, where the background color of table cells as well as text color and highlighted text color have some meaning for the viewer. Basically a big time table, where colors represent various categories. This could be a category for the whole weekend (table cell) or for a specific person during the weekend (highlighted text) or a specific task to be done (text color). Unfortunately, I don't have influence over the business process of the people updating this document. I need to write a small automated tool to make this document accessible to a blind user using a screen reader, so everything needs to be marked up semantically or explained through text.

It would be nice with a small example how to access the color information through the transformDocument function, so that I can output it (semantically) in the generated HTML code.

Is this possible? I played around with the API a bit, but it seems that only a few style properties are accessible, colors not being one of them. I imagine that something like this should be possible to implement:

function transformParagraph(paragraph) {
    console.log(paragraph.children[0].color); // => '#ff0000'
    console.log(paragraph.children[0].highlightedColor); // => null (if not specified)
}

If this is out-of-scope directly, maybe it would help to add a method on the individual elements to access the DOM state somehow. I imagine something like this:

function transformParagraph(paragraph) {
    // `dom()` is helper method to access the XML DOM behind this paragraph. Not sure if this is 100% correct :-)
    console.log(paragraph.dom().firstOrEmpty("w:color").attributes["w:val"]); // => '#ff0000'
}

Any feedback is appreciated.

walling avatar Sep 16 '21 16:09 walling

As you say, I don't think Mammoth currently expose colours, which I would expect would be on the runs (although I haven't checked). Some properties have been added purely for use in document transforms, such as the font, so I wouldn't be opposed to doing the same for colours.

Allowing direct access to the underlying XML is something that's come up before, but I've never gotten around to dealing with. One of the issues is that the XML representation that Mammoth uses is unique to Mammoth, so I'd be reluctant to expose it (although all of the data structures exposed by document transforms are marked as unstable anyway).

Also, while I remember, if you're looking for colours on runs, you'd probably want to use mammoth.transforms.getDescendantsOfType(paragraph, "run") rather than assuming paragraph.children[0] is a run.

mwilliamson avatar Oct 19 '21 19:10 mwilliamson

Hey, just wanted to know if there are plans for mammoth to include colors and background colors in the future.

DugarRishab avatar Jun 20 '24 12:06 DugarRishab

No plans at present.

mwilliamson avatar Jun 20 '24 16:06 mwilliamson

@mwilliamson then can I work on this? Can you guide me as to how I can add color support? From what I understand this is an issue in XML to JSON conversion. Is there any particular reason why this was not implemented earlier? I am asking so I can get a better understanding of the issue.

Any information you can provide me about this will be helpful. I want to implement this as this will be useful in my project.

DugarRishab avatar Jun 22 '24 07:06 DugarRishab

then can I work on this?

I'm afraid I'm not currently accepting pull requests for Mammoth.

Is there any particular reason why this was not implemented earlier?

No particular technical reason, mostly a lack of time and that I've prioritised other functionality.

mwilliamson avatar Jun 22 '24 12:06 mwilliamson