`>` in attribute breaks formatting
<div foo="a>b"
bar="1">
is incorrectly formatted to
<div foo="a>b"
bar="1">
(Without the > it works correctly).
This causes problems for Hotwire/Stimulus since it uses notation such as:
<button data-action="click->hello#greet">Greet</button>
So... I ran into this and figured out a solution, specifically for Stimulus, but it's not exactly pretty. Leaving my notes here in case anybody else wants to give it a go.
After digging through the source of this project and iterating over a LOT of regex tries, the secret lies in the
ELEMENT_CONTENT = %r{ (?:<%.*?%>|[^>])* }mx
# change to
ELEMENT_CONTENT = %r{ (?:<%.*?%>|data-action\s*=\s*"(?:[^"]*?->[^"]*?)"|[^>])* }mx
And the
[%r{<#{ELEMENT_CONTENT}[^/]>}om,
:open_element],
# change to
[%r{<#{ELEMENT_CONTENT}[^/]*?>}om,
:open_element],
Which basically reconfigures the parser to ignore anything inside a data-action="[anything]" attribute/value.
I tried to monkey-patch this in, but ultimately htmlbeautifier isn't loaded from something like Rails, which initializes monkey patches. There are other ways, but they didn't work well for me.
What did work well for me is... a bit rougher, but overall a self-contained solution.
I created a new bin-script called bin/erb then essentially collapsed this project down into a single executable script (make sure you chmod to make it executable!) with my edits inside.
[!NOTE]
I also prefer having a line break between basically every disparate element in my HTML, so I also tweaked
Builder#emit— you'll see the "JonSully override". Feel free to remove that line if you prefer
The Code for `bin/erb` (click to open)
#!/usr/bin/env ruby
# NOTE: Bundles up the gem `htmlbeautifier` into a single executable Ruby script
# NOTE: Contains a couple of overrides from the stock script.
# NOTE: Set `execute path` in the VS Code plugin to simply `bin/erb` (This file)
require "strscan"
require "optparse"
require "fileutils"
require "stringio"
class Parser
def initialize
@maps = []
yield self if block_given?
end
def map(pattern, method)
@maps << [pattern, method]
end
def scan(subject, receiver)
@scanner = StringScanner.new(subject)
dispatch(receiver) until @scanner.eos?
end
def source_so_far
@scanner.string[[email protected]]
end
def source_line_number
[source_so_far.chomp.split(%r{\n}).count, 1].max
end
private
def dispatch(receiver)
_, method = @maps.find { |pattern, _| @scanner.scan(pattern) }
raise "Unmatched sequence" unless method
receiver.__send__(method, *extract_params(@scanner))
rescue => e
raise "#{e.message} on line #{source_line_number}"
end
def extract_params(scanner)
return [scanner[0]] unless scanner[1]
params = []
i = 1
while scanner[i]
params << scanner[i]
i += 1
end
params
end
end
class HtmlParser < Parser
# ELEMENT_CONTENT = %r{ (?:<%.*?%>|[^>])* }mx # stock
ELEMENT_CONTENT = %r{ (?:<%.*?%>|data-action\s*=\s*"(?:[^"]*?->[^"]*?)"|[^>])* }mx # JonSully override
HTML_VOID_ELEMENTS = %r{(?:
area | base | br | col | command | embed | hr | img | input | keygen |
link | meta | param | source | track | wbr
)}mix
HTML_BLOCK_ELEMENTS = %r{(?:
address | article | aside | audio | blockquote | canvas | dd | details |
dir | div | dl | dt | fieldset | figcaption | figure | footer | form |
h1 | h2 | h3 | h4 | h5 | h6 | header | hr | li | menu | noframes |
noscript | ol | p | pre | section | table | tbody | td | tfoot | th |
thead | tr | ul | video
)}mix
MAPPINGS = [
[%r{(<%-?=?)(.*?)(-?%>)}om,
:embed],
[%r{<!--\[.*?\]>}om,
:open_ie_cc],
[%r{<!\[.*?\]-->}om,
:close_ie_cc],
[%r{<!--.*?-->}om,
:standalone_element],
[%r{<!.*?>}om,
:standalone_element],
[%r{(<script#{ELEMENT_CONTENT}>)(.*?)(</script>)}omi,
:foreign_block],
[%r{(<style#{ELEMENT_CONTENT}>)(.*?)(</style>)}omi,
:foreign_block],
[%r{(<pre#{ELEMENT_CONTENT}>)(.*?)(</pre>)}omi,
:preformatted_block],
[%r{(<textarea#{ELEMENT_CONTENT}>)(.*?)(</textarea>)}omi,
:preformatted_block],
[%r{<#{HTML_VOID_ELEMENTS}(?: #{ELEMENT_CONTENT})?/?>}om,
:standalone_element],
[%r{</#{HTML_BLOCK_ELEMENTS}>}om,
:close_block_element],
[%r{<#{HTML_BLOCK_ELEMENTS}(?: #{ELEMENT_CONTENT})?>}om,
:open_block_element],
[%r{</#{ELEMENT_CONTENT}>}om,
:close_element],
# [%r{<#{ELEMENT_CONTENT}[^/]>}om, # stock
# :open_element],
[%r{<#{ELEMENT_CONTENT}[^/]*?>}om, # JonSully override
:open_element],
[%r{<[\w\-]+(?: #{ELEMENT_CONTENT})?/>}om,
:standalone_element],
[%r{(\s*\r?\n\s*)+}om,
:new_lines],
[%r{[^<\n]+},
:text]
].freeze
def initialize
super do |p|
MAPPINGS.each do |regexp, method|
p.map regexp, method
end
end
end
end
class RubyIndenter
INDENT_KEYWORDS = %w[if elsif else unless while until begin for case when].freeze
OUTDENT_KEYWORDS = %w[elsif else end when].freeze
RUBY_INDENT = %r{
^ ( #{INDENT_KEYWORDS.join("|")} )\b
| \b ( do | \{ ) ( \s* \| [^|]+ \| )? $
}xo
RUBY_OUTDENT = %r{ ^ ( #{OUTDENT_KEYWORDS.join("|")} | \} ) \b }xo
def outdent?(lines)
lines.first =~ RUBY_OUTDENT
end
def indent?(lines)
lines.last =~ RUBY_INDENT
end
end
class Builder
DEFAULT_OPTIONS = {
indent: " ",
initial_level: 0,
stop_on_errors: false,
keep_blank_lines: 0
}.freeze
def initialize(output, options = {})
options = DEFAULT_OPTIONS.merge(options)
@tab = options[:indent]
@stop_on_errors = options[:stop_on_errors]
@level = options[:initial_level]
@keep_blank_lines = options[:keep_blank_lines]
@new_line = false
@empty = true
@ie_cc_levels = []
@output = output
@embedded_indenter = RubyIndenter.new
end
private
def error(text)
return unless @stop_on_errors
raise text
end
def indent
@level += 1
end
def outdent
error "Extraneous closing tag" if @level == 0
@level = [@level - 1, 0].max
end
def emit(*strings)
strings_join = strings.join("")
@output << "\n" if @new_line && !@empty
@output << (@tab * @level) if @new_line && !strings_join.strip.empty?
@output << strings_join
# @new_line = false # stock
@new_line = true # JonSully override
@empty = false
end
def new_line
@new_line = true
end
def embed(opening, code, closing)
lines = code.split(%r{\n}).map(&:strip)
outdent if @embedded_indenter.outdent?(lines)
emit opening, code, closing
indent if @embedded_indenter.indent?(lines)
end
def foreign_block(opening, code, closing)
emit opening
emit_reindented_block_content code unless code.strip.empty?
emit closing
end
def emit_reindented_block_content(code)
lines = code.strip.split(%r{\n})
indentation = foreign_block_indentation(code)
indent
new_line
lines.each do |line|
emit line.rstrip.sub(%r{^#{indentation}}, "")
new_line
end
outdent
end
def foreign_block_indentation(code)
code.split(%r{\n}).find { |ln| !ln.strip.empty? }[%r{^\s+}]
end
def preformatted_block(opening, content, closing)
new_line
emit opening, content, closing
new_line
end
def standalone_element(elem)
emit elem
new_line if elem =~ %r{^<br[^\w]}
end
def close_element(elem)
outdent
emit elem
end
def close_block_element(elem)
close_element elem
new_line
end
def open_element(elem)
emit elem
indent
end
def open_block_element(elem)
new_line
open_element elem
end
def close_ie_cc(elem)
if @ie_cc_levels.empty?
error "Unclosed conditional comment"
else
@level = @ie_cc_levels.pop
end
emit elem
end
def open_ie_cc(elem)
emit elem
@ie_cc_levels.push @level
indent
end
def new_lines(*content)
blank_lines = content.first.scan(%r{\n}).count - 1
blank_lines = [blank_lines, @keep_blank_lines].min
@output << ("\n" * blank_lines)
new_line
end
alias_method :text, :emit
end
# 1. If no files are listed, it will read from standard input and write to
# standard output.
# 2. If files are listed, it will modify each file in place, overwriting it
# with the beautified output.
# Available options are:
# tab_stops - an integer for the number of spaces to indent, default 2.
# Deprecated: see indent.
# indent - what to indent with (" ", "\t" etc.), default " "
# stop_on_errors - raise an exception on a badly-formed document. Default
# is false, i.e. continue to process the rest of the document.
# initial_level - The entire output will be indented by this number of steps.
# Default is 0.
# keep_blank_lines - an integer for the number of consecutive empty lines
# to keep in output.
#
# -------------- ACTUALLY DO THE THING --------------
def do_beautify(html, options = {})
if options[:tab_stops]
options[:indent] = " " * options[:tab_stops]
end
String.new.tap { |output|
HtmlParser.new.scan html.to_s, Builder.new(output, options)
}
end
def beautify(name, input, output, options)
output.puts do_beautify(input, options)
rescue => e
raise "Error parsing #{name}: #{e}"
end
executable = File.basename(__FILE__)
options = {indent: " "}
parser = OptionParser.new do |opts|
opts.banner = "Usage: #{executable} [options] [file ...]"
opts.separator <<~STRING
#{executable} has two modes of operation:
1. If no files are listed, it will read from standard input and write to
standard output.
2. If files are listed, it will modify each file in place, overwriting it
with the beautified output.
The following options are available:
STRING
opts.on(
"-t", "--tab-stops NUMBER", Integer,
"Set number of spaces per indent (default #{options[:tab_stops]})"
) do |num|
options[:indent] = " " * num
end
opts.on(
"-T", "--tab",
"Indent using tabs"
) do
options[:indent] = "\t"
end
opts.on(
"-i", "--indent-by NUMBER", Integer,
"Indent the output by NUMBER steps (default 0)."
) do |num|
options[:initial_level] = num
end
opts.on(
"-e", "--stop-on-errors",
"Stop when invalid nesting is encountered in the input"
) do |num|
options[:stop_on_errors] = num
end
opts.on(
"-b", "--keep-blank-lines NUMBER", Integer,
"Set number of consecutive blank lines"
) do |num|
options[:keep_blank_lines] = num
end
opts.on(
"-l", "--lint-only",
"Lint only, error on files which would be modified",
"This is not available when reading from standard input"
) do |num|
options[:lint_only] = num
end
end
parser.parse!
if ARGV.any?
failures = []
ARGV.each do |path|
input = File.read(path)
if options[:lint_only]
output = StringIO.new
beautify path, input, output, options
failures << path unless input == output.string
else
temppath = "#{path}.tmp"
File.open(temppath, "w") do |file|
beautify path, input, file, options
end
FileUtils.mv temppath, path
end
end
unless failures.empty?
warn [
"Lint failed - files would be modified:",
*failures
].join("\n")
exit 1
end
else
beautify "standard input", $stdin.read, $stdout, options
end
I told you it wasn't exactly pretty! But now if we set the VS Code extension to use a custom "execute path", setting it to simply bin/erb, it'll work.
Plus we can uninstall the htmlbeautifier gem itself since we're running our own stock ruby, not the gem.
Thanks for looking into that!
I'm curious though, why didn't you make the changes in a branch and point your Gemfile to that?
Yeah I guess that could've worked, I just found the library to be so small that it felt simpler to inline. Maybe I'll swap at some point, but it'll be easier to 'ship' changes in the future for my whole team if it's in git
PR: https://github.com/threedaymonk/htmlbeautifier/pull/82