docx icon indicating copy to clipboard operation
docx copied to clipboard

Exception thrown when calling to_html on file with internal hyperlinks

Open ycp3 opened this issue 1 year ago • 2 comments

Describe the bug

undefined method `value' for nil:NilClass error thrown when calling to_html on a file with internal hyperlinks (hyperlinks to a bookmark or a heading within the file).

Backtrace:

docx (0.8.0) lib/docx/containers/text_run.rb:106:in `hyperlink_id'
docx (0.8.0) lib/docx/containers/text_run.rb:102:in `href'
docx (0.8.0) lib/docx/containers/text_run.rb:81:in `to_html'
docx (0.8.0) lib/docx/containers/paragraph.rb:48:in `block in to_html'
docx (0.8.0) lib/docx/containers/paragraph.rb:47:in `each'
docx (0.8.0) lib/docx/containers/paragraph.rb:47:in `to_html'
docx (0.8.0) lib/docx/document.rb:119:in `map'
docx (0.8.0) lib/docx/document.rb:119:in `to_html' 

According to here the anchor attribute is used instead of the id attribute for internal hyperlinks, breaking line 106 in text_run.rb.

To Reproduce

Open a docx file with a hyperlink to either a heading or a bookmark in the same file and call to_html.

example

require 'docx'

doc = Docx::Document.new('/path/to/your/docx/file_with_internal_hyperlink.docx')

doc.to_html

Sample docx file

https://docs.google.com/document/d/1H01zgmdC2LHAAwXAhmm6RyEz-lwbZm6R/edit?usp=sharing&ouid=103282161859668866778&rtpof=true&sd=true

Expected behavior

No exception thrown; html gets returned as normal.

Environment

  • Ruby version: 3.2.2
  • docx gem version: 0.8.0
  • OS: Alpine 3.17 docker container

ycp3 avatar Oct 06 '23 14:10 ycp3

Hi @satoryu. Any idea what could be happening here? I seem to be having a similar problem on any docx version bigger than 0.5.0. 0.5.0 and older versions just sanitize the hyperlinks and print the plain text.

I'm on Ruby 3.1.4, Ubuntu 20.04.

Backtrace:

undefined method `[]' for nil:NilClass
 @document_properties[:hyperlinks][hyperlink_id]
 ^^^^^^^^^^^^^^
docx-0.6.0/lib/docx/containers/text_run.rb:100:in `href'
docx-0.6.0/lib/docx/containers/text_run.rb:79:in `to_html'
docx-0.6.0/lib/docx/containers/paragraph.rb:48:in `block in to_html'
docx-0.6.0/lib/docx/containers/paragraph.rb:47:in `each'
docx-0.6.0/lib/docx/containers/paragraph.rb:47:in `to_html'

mateusg avatar Oct 17 '23 23:10 mateusg

@ycp3 @mateusg Thank you for your reports.

I've just found out the root cause: this gem does not support internal links. I would like to fix this issue but need time.

I seem to be having a similar problem on any docx version bigger than 0.5.0. 0.5.0 and older versions just sanitize the hyperlinks and print the plain text.

Yes, right. Do you think that printing external links as sanitized text makes sense ?

satoryu avatar Oct 21 '23 04:10 satoryu