mammoth.js icon indicating copy to clipboard operation
mammoth.js copied to clipboard

Retain numbering id in list paragraphs

Open tripodsan opened this issue 3 years ago • 15 comments

Assume you have a document with 2 level 0 ordered lists:

1. one
2. two
Something else
3. three
Something else
1. one
2. two

the numbering information provided in the AST node does not contain the information about the numbering, so it's not possible to continue the first list after the non-list paragraph.

document[8]
├─0 paragraph[1]
│   │ styleId: "ListParagraph"
│   │ numbering: {"isOrdered":true,"level":"0"}
│   └─0 run[1]
│       └─0 text "One"
├─1 paragraph[1]
│   │ styleId: "ListParagraph"
│   │ numbering: {"isOrdered":true,"level":"0"}
│   └─0 run[1]
│       └─0 text "Two"
├─2 paragraph[1]
│   │ styleId: "Normal"
│   └─0 run[1]
│       └─0 text "Something else"
├─3 paragraph[1]
│   │ styleId: "ListParagraph"
│   │ numbering: {"isOrdered":true,"level":"0"}
│   └─0 run[1]
│       └─0 text "Three"
├─4 paragraph[1]
│   └─0 run[1]
│       └─0 text "Something else"
├─5 paragraph[1]
│   │ styleId: "ListParagraph"
│   │ numbering: {"isOrdered":true,"level":"0"}
│   └─0 run[1]
│       └─0 text "One"
├─6 paragraph[1]
│   │ styleId: "ListParagraph"
│   │ numbering: {"isOrdered":true,"level":"0"}
│   └─0 run[1]
│       └─0 text " Two"
└─7 paragraph[0]

If the numId would be added to the numbering information, it would be possible to detect the continuation.

document[8]
├─0 paragraph[1]
│   │ styleId: "ListParagraph"
│   │ numbering: {"isOrdered":true,"level":"0","numId":"1"}
│   └─0 run[1]
│       └─0 text "One"
├─1 paragraph[1]
│   │ styleId: "ListParagraph"
│   │ numbering: {"isOrdered":true,"level":"0","numId":"1"}
│   └─0 run[1]
│       └─0 text "Two"
├─2 paragraph[1]
│   │ styleId: "Normal"
│   └─0 run[1]
│       └─0 text "Something else"
├─3 paragraph[1]
│   │ styleId: "ListParagraph"
│   │ numbering: {"isOrdered":true,"level":"0","numId":"1"}
│   └─0 run[1]
│       └─0 text "Three"
├─4 paragraph[1]
│   └─0 run[1]
│       └─0 text "Something else"
├─5 paragraph[1]
│   │ styleId: "ListParagraph"
│   │ numbering: {"isOrdered":true,"level":"0","numId":"4"}      <<========
│   └─0 run[1]
│       └─0 text "One"
├─6 paragraph[1]
│   │ styleId: "ListParagraph"
│   │ numbering: {"isOrdered":true,"level":"0","numId":"4"}
│   └─0 run[1]
│       └─0 text " Two"
└─7 paragraph[0]
diff --git a/lib/docx/numbering-xml.js b/lib/docx/numbering-xml.js
index 64c4210..a68fbf4 100644
--- a/lib/docx/numbering-xml.js
+++ b/lib/docx/numbering-xml.js
@@ -15,13 +15,16 @@ function Numbering(nums, abstractNums, styles) {
         }),
         "paragraphStyleId"
     );

     function findLevel(numId, level) {
         var num = nums[numId];
         if (num) {
             var abstractNum = abstractNums[num.abstractNumId];
             if (abstractNum.numStyleLink == null) {
-                return abstractNums[num.abstractNumId].levels[level];
+                var lvl = abstractNums[num.abstractNumId].levels[level];
+                return Object.assign({numId: numId}, lvl);
             } else {
                 var style = styles.findNumberingStyleById(abstractNum.numStyleLink);
                 return findLevel(style.numId, level);

tripodsan avatar Jan 14 '21 06:01 tripodsan

hello @tripodsan I face this same issue. Can you please tell your solution for this ?

hoang avatar Aug 08 '21 02:08 hoang

hello @tripodsan I face this same issue. Can you please tell your solution for this ?

I have a fork: https://github.com/adobe-rnd/mammoth.js/tree/bleeding that uses my suggestion above: https://github.com/adobe-rnd/mammoth.js/commit/60a679eb0c0599c7b0f5d2ca83fbb1c55b84c73e

it's released as: https://www.npmjs.com/package/@adobe/mammoth/v/1.4.15-bleeding.1

tripodsan avatar Aug 08 '21 10:08 tripodsan

Hey @tripodsan ! Your solution seems perfect ! Congrats and thanks for sharing it !

Wondering why this is a fork, and not a PR on this repo ? Obviously we'd prefer to use the original package in production.

Could you bother making a PR with your work here ? If not: could I make it myself, using your work ?

Have the best day !

VictorBaron avatar Dec 01 '21 16:12 VictorBaron

hi @VictorBaron. I can't remember why I didn't submit the PR... but I will create one asap.

tripodsan avatar Dec 01 '21 22:12 tripodsan

@tripodsan Hey, thanks so much for your fix! Just curious if you still plan on making a PR?

hmnd avatar Jan 18 '22 07:01 hmnd

@tripodsan Hey, thanks so much for your fix! Just curious if you still plan on making a PR?

sorry @hmnd , I was preoccupied with other things.... I'll take a look at it now.

tripodsan avatar Jan 18 '22 07:01 tripodsan

If anyone who's interested in this issue could post a minimal example document, the expected HTML, and the actual HTML, then that would be helpful.

Also, since the suggestion here is just to add the numbering ID, then presumably there are other things being done e.g. a document transform? I'm reluctant just to add the numbering ID without understanding how that actually solves the problem.

mwilliamson avatar Sep 19 '22 09:09 mwilliamson

I'm reluctant just to add the numbering ID without understanding how that actually solves the problem.

it only solves half of the problem - where the document tree is used for further processing (e.g. generating markdown). it doesn't include a solution for the HTML rendering.

tripodsan avatar Sep 19 '22 09:09 tripodsan

I'm reluctant just to add the numbering ID without understanding how that actually solves the problem.

it only solves half of the problem - where the document tree is used for further processing (e.g. generating markdown). it doesn't include a solution for the HTML rendering.

Given this is a library for generating HTML, that feels like a pretty important part!

It would also be useful to see examples of the further processing so that I can understand how this would be used in context.

mwilliamson avatar Sep 19 '22 10:09 mwilliamson

Given this is a library for generating HTML, that feels like a pretty important part!

I think it's a great library for parsing docx and turning it in a syntax tree. the HTML generation is a nice side effect :-)

It would also be useful to see examples of the further processing so that I can understand how this would be used in context.

it is a bit complicated to explain in a short code snippet (I invited you to our repo)....

Anyways, I will come up with a PR that includes the OL support with numbering problems across lists.

tripodsan avatar Sep 19 '22 10:09 tripodsan

Anyways, I will come up with a PR that includes the OL support with numbering problems across lists.

I'm not generally accepting pull requests at the moment, since it usually ends up taking more time and effort (due to rounds of review, and having to port the changes to multiple implementations). Discussions of the high level approach are welcome though.

mwilliamson avatar Sep 19 '22 10:09 mwilliamson

I'm running into a similar issue where I would like the generated HTML to handle list continuations. I think the most semantic way to address this in HTML would be to use the start attribute for the ol elements.

Input:

1. one
2. two
Something else
3. three
4. four
Something else
1. one
2. two

Expected output:

<ol>
  <li>one</li>
  <li>two</li>
</ol>
<p>Something else</p>
<ol start="3">
  <li>three</li>
  <li>four</li>
</ol>
<p>Something else</p>
<ol>
  <li>one</li>
  <li>two</li>
</ol>

The high level logic would be to set the start attribute of any ol element that does not start with 1.

kiejo avatar Aug 29 '23 08:08 kiejo

Any updates on this? It's been over half a year since the last comment and over 1.5 years since the last maintainer commented on this.

inimeseke avatar Apr 11 '24 14:04 inimeseke

As above, minimal example documents, along with the actual and expected HTML, would be helpful.

mwilliamson avatar Apr 11 '24 17:04 mwilliamson

I believe that issue #394 contains these examples.

Like @kiejo wrote above, the high level logic would be to set the start attribute of any <ol> element that does not start with 1.

inimeseke avatar Apr 12 '24 09:04 inimeseke