htmlbeautifier icon indicating copy to clipboard operation
htmlbeautifier copied to clipboard

`>` in attribute breaks formatting

Open andyw8 opened this issue 1 year ago • 4 comments

<div foo="a>b"
     bar="1">

is incorrectly formatted to

<div foo="a>b"
  bar="1">

(Without the > it works correctly).

This causes problems for Hotwire/Stimulus since it uses notation such as:

<button data-action="click->hello#greet">Greet</button>

andyw8 avatar Feb 06 '24 14:02 andyw8

So... I ran into this and figured out a solution, specifically for Stimulus, but it's not exactly pretty. Leaving my notes here in case anybody else wants to give it a go.

After digging through the source of this project and iterating over a LOT of regex tries, the secret lies in the

ELEMENT_CONTENT = %r{ (?:<%.*?%>|[^>])* }mx
# change to
ELEMENT_CONTENT = %r{ (?:<%.*?%>|data-action\s*=\s*"(?:[^"]*?->[^"]*?)"|[^>])* }mx

And the

[%r{<#{ELEMENT_CONTENT}[^/]>}om,
      :open_element],
# change to
[%r{<#{ELEMENT_CONTENT}[^/]*?>}om,
      :open_element],

Which basically reconfigures the parser to ignore anything inside a data-action="[anything]" attribute/value.

I tried to monkey-patch this in, but ultimately htmlbeautifier isn't loaded from something like Rails, which initializes monkey patches. There are other ways, but they didn't work well for me.

What did work well for me is... a bit rougher, but overall a self-contained solution.

I created a new bin-script called bin/erb then essentially collapsed this project down into a single executable script (make sure you chmod to make it executable!) with my edits inside.

[!NOTE]

I also prefer having a line break between basically every disparate element in my HTML, so I also tweaked Builder#emit — you'll see the "JonSully override". Feel free to remove that line if you prefer

The Code for `bin/erb` (click to open)

#!/usr/bin/env ruby

# NOTE: Bundles up the gem `htmlbeautifier` into a single executable Ruby script
# NOTE: Contains a couple of overrides from the stock script.

# NOTE: Set `execute path` in the VS Code plugin to simply `bin/erb` (This file)

require "strscan"
require "optparse"
require "fileutils"
require "stringio"

class Parser
  def initialize
    @maps = []
    yield self if block_given?
  end

  def map(pattern, method)
    @maps << [pattern, method]
  end

  def scan(subject, receiver)
    @scanner = StringScanner.new(subject)
    dispatch(receiver) until @scanner.eos?
  end

  def source_so_far
    @scanner.string[[email protected]]
  end

  def source_line_number
    [source_so_far.chomp.split(%r{\n}).count, 1].max
  end

  private

  def dispatch(receiver)
    _, method = @maps.find { |pattern, _| @scanner.scan(pattern) }
    raise "Unmatched sequence" unless method

    receiver.__send__(method, *extract_params(@scanner))
  rescue => e
    raise "#{e.message} on line #{source_line_number}"
  end

  def extract_params(scanner)
    return [scanner[0]] unless scanner[1]

    params = []
    i = 1
    while scanner[i]
      params << scanner[i]
      i += 1
    end
    params
  end
end

class HtmlParser < Parser
  # ELEMENT_CONTENT = %r{ (?:<%.*?%>|[^>])* }mx # stock
  ELEMENT_CONTENT = %r{ (?:<%.*?%>|data-action\s*=\s*"(?:[^"]*?->[^"]*?)"|[^>])* }mx # JonSully override

  HTML_VOID_ELEMENTS = %r{(?:
    area | base | br | col | command | embed | hr | img | input | keygen |
    link | meta | param | source | track | wbr
  )}mix
  HTML_BLOCK_ELEMENTS = %r{(?:
    address | article | aside | audio | blockquote | canvas | dd | details |
    dir | div | dl | dt | fieldset | figcaption | figure | footer | form |
    h1 | h2 | h3 | h4 | h5 | h6 | header | hr | li | menu | noframes |
    noscript | ol | p | pre | section | table | tbody | td | tfoot | th |
    thead | tr | ul | video
  )}mix

  MAPPINGS = [
    [%r{(<%-?=?)(.*?)(-?%>)}om,
      :embed],
    [%r{<!--\[.*?\]>}om,
      :open_ie_cc],
    [%r{<!\[.*?\]-->}om,
      :close_ie_cc],
    [%r{<!--.*?-->}om,
      :standalone_element],
    [%r{<!.*?>}om,
      :standalone_element],
    [%r{(<script#{ELEMENT_CONTENT}>)(.*?)(</script>)}omi,
      :foreign_block],
    [%r{(<style#{ELEMENT_CONTENT}>)(.*?)(</style>)}omi,
      :foreign_block],
    [%r{(<pre#{ELEMENT_CONTENT}>)(.*?)(</pre>)}omi,
      :preformatted_block],
    [%r{(<textarea#{ELEMENT_CONTENT}>)(.*?)(</textarea>)}omi,
      :preformatted_block],
    [%r{<#{HTML_VOID_ELEMENTS}(?: #{ELEMENT_CONTENT})?/?>}om,
      :standalone_element],
    [%r{</#{HTML_BLOCK_ELEMENTS}>}om,
      :close_block_element],
    [%r{<#{HTML_BLOCK_ELEMENTS}(?: #{ELEMENT_CONTENT})?>}om,
      :open_block_element],
    [%r{</#{ELEMENT_CONTENT}>}om,
      :close_element],

    # [%r{<#{ELEMENT_CONTENT}[^/]>}om, # stock
    #   :open_element],
    [%r{<#{ELEMENT_CONTENT}[^/]*?>}om, # JonSully override
      :open_element],

    [%r{<[\w\-]+(?: #{ELEMENT_CONTENT})?/>}om,
      :standalone_element],
    [%r{(\s*\r?\n\s*)+}om,
      :new_lines],
    [%r{[^<\n]+},
      :text]
  ].freeze

  def initialize
    super do |p|
      MAPPINGS.each do |regexp, method|
        p.map regexp, method
      end
    end
  end
end

class RubyIndenter
  INDENT_KEYWORDS = %w[if elsif else unless while until begin for case when].freeze
  OUTDENT_KEYWORDS = %w[elsif else end when].freeze
  RUBY_INDENT = %r{
    ^ ( #{INDENT_KEYWORDS.join("|")} )\b
    | \b ( do | \{ ) ( \s* \| [^|]+ \| )? $
  }xo
  RUBY_OUTDENT = %r{ ^ ( #{OUTDENT_KEYWORDS.join("|")} | \} ) \b }xo

  def outdent?(lines)
    lines.first =~ RUBY_OUTDENT
  end

  def indent?(lines)
    lines.last =~ RUBY_INDENT
  end
end

class Builder
  DEFAULT_OPTIONS = {
    indent: "  ",
    initial_level: 0,
    stop_on_errors: false,
    keep_blank_lines: 0
  }.freeze

  def initialize(output, options = {})
    options = DEFAULT_OPTIONS.merge(options)
    @tab = options[:indent]
    @stop_on_errors = options[:stop_on_errors]
    @level = options[:initial_level]
    @keep_blank_lines = options[:keep_blank_lines]
    @new_line = false
    @empty = true
    @ie_cc_levels = []
    @output = output
    @embedded_indenter = RubyIndenter.new
  end

  private

  def error(text)
    return unless @stop_on_errors

    raise text
  end

  def indent
    @level += 1
  end

  def outdent
    error "Extraneous closing tag" if @level == 0
    @level = [@level - 1, 0].max
  end

  def emit(*strings)
    strings_join = strings.join("")
    @output << "\n" if @new_line && !@empty
    @output << (@tab * @level) if @new_line && !strings_join.strip.empty?
    @output << strings_join
    # @new_line = false # stock
    @new_line = true # JonSully override
    @empty = false
  end

  def new_line
    @new_line = true
  end

  def embed(opening, code, closing)
    lines = code.split(%r{\n}).map(&:strip)
    outdent if @embedded_indenter.outdent?(lines)
    emit opening, code, closing
    indent if @embedded_indenter.indent?(lines)
  end

  def foreign_block(opening, code, closing)
    emit opening
    emit_reindented_block_content code unless code.strip.empty?
    emit closing
  end

  def emit_reindented_block_content(code)
    lines = code.strip.split(%r{\n})
    indentation = foreign_block_indentation(code)

    indent
    new_line
    lines.each do |line|
      emit line.rstrip.sub(%r{^#{indentation}}, "")
      new_line
    end
    outdent
  end

  def foreign_block_indentation(code)
    code.split(%r{\n}).find { |ln| !ln.strip.empty? }[%r{^\s+}]
  end

  def preformatted_block(opening, content, closing)
    new_line
    emit opening, content, closing
    new_line
  end

  def standalone_element(elem)
    emit elem
    new_line if elem =~ %r{^<br[^\w]}
  end

  def close_element(elem)
    outdent
    emit elem
  end

  def close_block_element(elem)
    close_element elem
    new_line
  end

  def open_element(elem)
    emit elem
    indent
  end

  def open_block_element(elem)
    new_line
    open_element elem
  end

  def close_ie_cc(elem)
    if @ie_cc_levels.empty?
      error "Unclosed conditional comment"
    else
      @level = @ie_cc_levels.pop
    end
    emit elem
  end

  def open_ie_cc(elem)
    emit elem
    @ie_cc_levels.push @level
    indent
  end

  def new_lines(*content)
    blank_lines = content.first.scan(%r{\n}).count - 1
    blank_lines = [blank_lines, @keep_blank_lines].min
    @output << ("\n" * blank_lines)
    new_line
  end

  alias_method :text, :emit
end





# 1. If no files are listed, it will read from standard input and write to
# standard output.
# 2. If files are listed, it will modify each file in place, overwriting it
# with the beautified output.

# Available options are:
# tab_stops - an integer for the number of spaces to indent, default 2.
# Deprecated: see indent.
# indent - what to indent with ("  ", "\t" etc.), default "  "
# stop_on_errors - raise an exception on a badly-formed document. Default
# is false, i.e. continue to process the rest of the document.
# initial_level - The entire output will be indented by this number of steps.
# Default is 0.
# keep_blank_lines - an integer for the number of consecutive empty lines
# to keep in output.
#
  
# -------------- ACTUALLY DO THE THING --------------

def do_beautify(html, options = {})
  if options[:tab_stops]
    options[:indent] = " " * options[:tab_stops]
  end
  String.new.tap { |output|
    HtmlParser.new.scan html.to_s, Builder.new(output, options)
  }
end





def beautify(name, input, output, options)
  output.puts do_beautify(input, options)
rescue => e
  raise "Error parsing #{name}: #{e}"
end

executable = File.basename(__FILE__)

options = {indent: "  "}
parser = OptionParser.new do |opts|
  opts.banner = "Usage: #{executable} [options] [file ...]"
  opts.separator <<~STRING

    #{executable} has two modes of operation:

    1. If no files are listed, it will read from standard input and write to
       standard output.
    2. If files are listed, it will modify each file in place, overwriting it
       with the beautified output.

    The following options are available:

  STRING
  opts.on(
    "-t", "--tab-stops NUMBER", Integer,
    "Set number of spaces per indent (default #{options[:tab_stops]})"
  ) do |num|
    options[:indent] = " " * num
  end
  opts.on(
    "-T", "--tab",
    "Indent using tabs"
  ) do
    options[:indent] = "\t"
  end
  opts.on(
    "-i", "--indent-by NUMBER", Integer,
    "Indent the output by NUMBER steps (default 0)."
  ) do |num|
    options[:initial_level] = num
  end
  opts.on(
    "-e", "--stop-on-errors",
    "Stop when invalid nesting is encountered in the input"
  ) do |num|
    options[:stop_on_errors] = num
  end
  opts.on(
    "-b", "--keep-blank-lines NUMBER", Integer,
    "Set number of consecutive blank lines"
  ) do |num|
    options[:keep_blank_lines] = num
  end
  opts.on(
    "-l", "--lint-only",
    "Lint only, error on files which would be modified",
    "This is not available when reading from standard input"
  ) do |num|
    options[:lint_only] = num
  end
end

parser.parse!

if ARGV.any?
  failures = []
  ARGV.each do |path|
    input = File.read(path)
    if options[:lint_only]
      output = StringIO.new
      beautify path, input, output, options
      failures << path unless input == output.string
    else
      temppath = "#{path}.tmp"
      File.open(temppath, "w") do |file|
        beautify path, input, file, options
      end
      FileUtils.mv temppath, path
    end
  end
  unless failures.empty?
    warn [
      "Lint failed - files would be modified:",
      *failures
    ].join("\n")
    exit 1
  end
else
  beautify "standard input", $stdin.read, $stdout, options
end

I told you it wasn't exactly pretty! But now if we set the VS Code extension to use a custom "execute path", setting it to simply bin/erb, it'll work.

Plus we can uninstall the htmlbeautifier gem itself since we're running our own stock ruby, not the gem.

jon-sully avatar Sep 23 '24 19:09 jon-sully

Thanks for looking into that!

I'm curious though, why didn't you make the changes in a branch and point your Gemfile to that?

andyw8 avatar Sep 24 '24 01:09 andyw8

Yeah I guess that could've worked, I just found the library to be so small that it felt simpler to inline. Maybe I'll swap at some point, but it'll be easier to 'ship' changes in the future for my whole team if it's in git

jon-sully avatar Sep 24 '24 01:09 jon-sully

PR: https://github.com/threedaymonk/htmlbeautifier/pull/82

andyw8 avatar Oct 14 '24 17:10 andyw8