sablon icon indicating copy to clipboard operation
sablon copied to clipboard

Inline placeholder image causes other normal images to be replaced

Open pimlottc-gov opened this issue 5 years ago • 3 comments

When you have an inline placeholder image to be replaced (i.e. within image replacement fields), other non-placeholder images in the document after the placeholder image can end up being replaced along with the placeholder image.

For example:

Template:

inline template

Expected:

inline correct

Actual:

inline incorrect

Attached is a modified images_template.docx that demonstrates the issue as shown above: images_template.docx

pimlottc-gov avatar Dec 30 '19 23:12 pimlottc-gov

What seems to be happening here is that, when the placeholder image is inline with the tags, the start_field and end_field end up within the same w:p tag and start_node and end_node are the same. When the ImageBlock tries to collect the body nodes, it pulls in the entire rest of the document, so that replace ends up replacing the first image in every subsequent node.

The solution seems to be to update the body method in blocks.rb to check for this condition:

        def body
          return [] if start_node == end_node

However, this code is used by multiple other Block subclasses, and I'm not an expert in WordML, so I'm not certain if it wouldn't cause problems for other blocks or situations.

pimlottc-gov avatar Dec 30 '19 23:12 pimlottc-gov

Interesting I thought I fixed the inline replacement problem in #131 but hopefully I'll have time later this week to look into it further.

stadelmanma avatar Dec 31 '19 01:12 stadelmanma

Thanks. I just realized there's another case that my proposed fix doesn't address - if there is another inline image preceding the placeholder image in the same paragraph. replace searches the entire w:p tag containing the placeholder and matches the first image, even though it's before the starting tag.

I think what's really needed is a more robust algorithm to walk the xml tree and pick only the nodes that are actually between the start and end nodes, in document order. I'm not sure exactly what that would look like yet, but I think some sort of modified depth first search traversal might do the trick.

This seems like it would be a common problem when parsing open office xml documents, perhaps there is a well known algorithm that can be reused.

Template: inline2-template

Expected: inline2-correct

Actual (with proposed fix): inline2-incorrect

pimlottc-gov avatar Dec 31 '19 17:12 pimlottc-gov