maruku
maruku copied to clipboard
CharSourceStrscan does not work correctly with UTF-8 strings. Remove it.
CharSourceStrScan, an alternate CharSource implementation that is not enabled by default, expects characters to be 1 byte. UTF-8 strings break it.
This removes it entirely.
Example:
Rendering
<p>ö <strong>a</strong></p>
In Ruby 1.9.x:
<p>ö <strong>a</strong></p>
In Ruby 2.1 and above:
parse_span.rb:32:in `read_span': invalid byte sequence in UTF-8 (ArgumentError)
Coverage increased (+1.4%) to 78.793% when pulling d68f7855df41555823a8186a87b882b245827689 on caseyf:caseyf-remove-charsourcestrscan into ec44b2709d6c617f6c5f7d79caec9b40570cdd68 on bhollis:master.
Alternatively, one can fix CharSourceStrscan
to be multi-byte-aware.
I would still make CharSourceManual
the default, 'cuz it's faster.
A multi-byte aware implementation would replace these methods. Here is a stab at it:
class CharSourceStrscan
def cur_char
@scanner.match?(/./m) && @scanner.matched
end
def cur_chars(n)
r = Regexp.new(".{0,#{n}}", Regexp::MULTILINE)
@scanner.match?(r) && @scanner.matched
end
def next_char
@scanner.match?(/../m) && @scanner.matched && @scanner.matched.last
end
def shift_char
@scanner.getch
end
def ignore_char
@scanner.getch
nil
end
def ignore_chars(n)
n.times { @scanner.getch }
nil
end
end
If there's interest in a multi-byte-aware version, I can make a pull request out of the above-linked commits.