kramdown icon indicating copy to clipboard operation
kramdown copied to clipboard

Unable to decode HTML entities

Open asbjornu opened this issue 2 years ago • 4 comments

I want to convert HTML to plain text within a Kramdown plugin I'm making and I'm unable to get HTML entities decoded no matter what I do. Here's one of many things I've tried:

html = "<h1>&amp; &gt; &lt;</h1>"
doc = Kramdown::Document.new(html, input: :html, entity_output: :as_char)
puts doc.to_kramdown # Outputs: # &amp; &gt; &lt;

I expected the output to be # & > < and not # &amp; &gt; &lt;. What am I doing wrong here?

asbjornu avatar Oct 25 '21 12:10 asbjornu

You are doing nothing wrong, this is just how the conversion is done. The entities for < > & " are not converted to characters.

gettalong avatar Oct 25 '21 13:10 gettalong

Thanks for the quick reply! Would it be possible to change that behavior, somehow?

asbjornu avatar Oct 25 '21 13:10 asbjornu

I'm open to pull requests that adjust this behaviour in the kramdown converter, the used utility function should not be changed because it is used in several places.

gettalong avatar Oct 25 '21 18:10 gettalong

Thanks, @gettalong. I may have a look at this when time allows. For now, I've circumvented the issue by using REXML directly.

asbjornu avatar Oct 26 '21 13:10 asbjornu