Incorrect spans generated for HTML with higher-plane unicode characters
When parsing HTML that includes characters like "🍋", the start and end FileLocations are generated incorrectly.
Here's a short repo:
import 'package:html/dom.dart';
import "package:html/parser.dart";
import "package:source_span/source_span.dart";
void main() {
final dom = parse(contents,generateSpans: true);
final Element element = dom.querySelectorAll("link").single;
final span = element.sourceSpan;
final spanCopy = new SourceSpan(span.start, span.end, contents);
}
const contents = """
<head>
<meta charset="UTF-8">
<title></title>
<link rel="alternate" type="application/rss+xml" title="ArtLung » Limones 🍋 Comments Feed" href="subdirectory/other.html" />
</head>
""";
This will throw the following error:
Unhandled exception:
Invalid argument(s): Text "<head>
<meta charset="UTF-8">
<title></title>
<link rel="alternate" type="application/rss+xml" title="ArtLung » Limones 🍋 Comments Feed" href="subdirectory/other.html" />
</head>
" must be 130 characters long.
#0 new SourceSpanBase (package:source_span/src/span.dart:85:7)
#1 new SourceSpan (package:source_span/src/span.dart:34:11)
#2 main (file:///Users/filiph/dev/linkcheck/test/source_span_bug.dart:9:24)
#3 _startIsolate.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:265)
#4 _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:151)
This is not an issue with package:source_span — when I create the span manually, without parse(), copying it works okay.
Hi, friendly nudge. This prevents package:html to be used with HTML that includes unicode chars in attributes. Which is an increasing portion of them (according to bugs reported to linkcheck).
I've created a pull request with a fix: https://github.com/dart-lang/html/pull/109
Carriage returns also affect the file location start and end points.
Hey there, thanks for the great work. Now that the fix is merged, would it be possible to release a new version?
We're stuck with this issue downstream (there: https://github.com/filiph/linkcheck/issues/35)