Fix parsing of link destinations that look like `code` or <html>
stack bench output:
Before
commonmark-cli > build (exe)
commonmark-cli > Preprocessing executable 'commonmark' for commonmark-cli-0.2.1..
commonmark-cli > Building executable 'commonmark' for commonmark-cli-0.2.1..
commonmark-cli > copy/register
commonmark-cli > Installing executable commonmark in /home/michael/Development/commonmark-hs/.stack-work/install/x86_64-linux/d6a7ffd91072a14c336d55f594f5fe3da20e2523e0123ffc206169986025661e/8.10.7/bin
commonmark > benchmarks
Running 1 benchmarks...
Benchmark benchmark-commonmark: RUNNING...
All
tokenize
tokenize sample.md: OK (0.24s)
3.6 ms ± 250 μs
parse sample.md
commonmark default: OK (0.34s)
48 ms ± 2.3 ms
pathological
nested strong emph
commonmark
1000: OK (0.29s)
1.1 ms ± 56 μs
2000: OK (0.28s)
2.2 ms ± 103 μs
3000: OK (0.43s)
3.3 ms ± 109 μs
4000: OK (0.28s)
4.5 ms ± 323 μs
many emph closers with no openers
commonmark
1000: OK (0.54s)
1.0 ms ± 44 μs
2000: OK (0.14s)
2.2 ms ± 200 μs
3000: OK (0.42s)
3.3 ms ± 91 μs
4000: OK (0.27s)
4.3 ms ± 322 μs
many emph openers with no closers
commonmark
1000: OK (0.26s)
1.0 ms ± 54 μs
2000: OK (0.27s)
2.0 ms ± 127 μs
3000: OK (0.20s)
3.2 ms ± 214 μs
4000: OK (4.48s)
4.3 ms ± 355 μs
many link closers with no openers
commonmark
1000: OK (0.32s)
1.2 ms ± 91 μs
2000: OK (0.39s)
2.9 ms ± 279 μs
3000: OK (1.05s)
4.0 ms ± 57 μs
4000: OK (0.37s)
5.5 ms ± 342 μs
many link openers with no closers
commonmark
1000: OK (0.33s)
1.2 ms ± 63 μs
2000: OK (0.17s)
2.6 ms ± 175 μs
3000: OK (0.54s)
4.3 ms ± 419 μs
4000: OK (0.36s)
5.5 ms ± 274 μs
mismatched openers and closers
commonmark
1000: OK (0.35s)
11 ms ± 698 μs
2000: OK (0.30s)
43 ms ± 3.3 ms
3000: OK (0.29s)
97 ms ± 3.3 ms
4000: OK (0.53s)
175 ms ± 5.0 ms
openers and closers multiple of 3
commonmark
1000: OK (0.37s)
3.0 ms ± 144 μs
2000: OK (0.32s)
10 ms ± 824 μs
3000: OK (0.35s)
23 ms ± 1.2 ms
4000: OK (0.12s)
40 ms ± 2.7 ms
link openers and emph closers
commonmark
1000: OK (0.30s)
1.1 ms ± 113 μs
2000: OK (0.30s)
2.3 ms ± 213 μs
3000: OK (0.23s)
3.6 ms ± 190 μs
4000: OK (0.31s)
4.9 ms ± 295 μs
nested brackets
commonmark
1000: OK (0.44s)
6.8 ms ± 251 μs
2000: OK (0.38s)
25 ms ± 925 μs
3000: OK (0.17s)
56 ms ± 3.0 ms
4000: OK (0.29s)
96 ms ± 5.9 ms
inline link openers without closers
commonmark
1000: OK (0.55s)
2.1 ms ± 71 μs
2000: OK (0.28s)
4.4 ms ± 291 μs
3000: OK (0.44s)
6.8 ms ± 256 μs
4000: OK (0.15s)
10 ms ± 729 μs
repeated pattern '[ (]('
commonmark
1000: OK (0.36s)
1.4 ms ± 48 μs
2000: OK (0.37s)
2.8 ms ± 187 μs
3000: OK (0.26s)
4.3 ms ± 170 μs
4000: OK (0.39s)
5.9 ms ± 239 μs
nested block quotes
commonmark
1000: OK (0.49s)
937 μs ± 28 μs
2000: OK (0.27s)
2.0 ms ± 162 μs
3000: OK (0.85s)
3.3 ms ± 183 μs
4000: OK (0.15s)
4.7 ms ± 387 μs
nested list
commonmark
1000: OK (0.14s)
529 μs ± 50 μs
2000: OK (0.40s)
769 μs ± 25 μs
3000: OK (0.27s)
1.0 ms ± 50 μs
4000: OK (0.33s)
1.3 ms ± 57 μs
nested list 2
commonmark
1000: OK (0.34s)
2.8 ms ± 118 μs
2000: OK (0.42s)
6.4 ms ± 395 μs
3000: OK (0.32s)
10 ms ± 620 μs
4000: OK (0.48s)
15 ms ± 518 μs
backticks
commonmark
1000: OK (0.20s)
400 μs ± 29 μs
2000: OK (0.23s)
857 μs ± 52 μs
3000: OK (0.32s)
1.2 ms ± 43 μs
4000: OK (0.22s)
1.6 ms ± 147 μs
CDATA
commonmark
1000: OK (0.23s)
898 μs ± 74 μs
2000: OK (0.23s)
1.8 ms ± 89 μs
3000: OK (0.37s)
2.9 ms ± 127 μs
4000: OK (0.49s)
3.8 ms ± 173 μs
<?
commonmark
1000: OK (24.57s)
1.5 ms ± 42 μs
2000: OK (0.81s)
3.1 ms ± 187 μs
3000: OK (0.17s)
5.5 ms ± 398 μs
4000: OK (0.84s)
6.3 ms ± 305 μs
<!A
commonmark
1000: OK (0.31s)
1.2 ms ± 105 μs
2000: OK (0.32s)
2.4 ms ± 132 μs
3000: OK (0.24s)
3.7 ms ± 262 μs
4000: OK (0.33s)
5.0 ms ± 423 μs
All 74 tests passed (54.37s)
Benchmark benchmark-commonmark: FINISH
commonmark-extensions> benchmarks
Running 1 benchmarks...
Benchmark benchmark-commonmark-extensions: RUNNING...
All
commonmark +smart: OK (0.37s)
50 ms ± 4.2 ms
commonmark +autolink: OK (0.36s)
52 ms ± 2.5 ms
commonmark +attributes: OK (1.47s)
47 ms ± 1.9 ms
commonmark +pipe_table: OK (0.32s)
43 ms ± 4.1 ms
All 4 tests passed (2.55s)
Benchmark benchmark-commonmark-extensions: FINISH
Completed 3 action(s).
After
$ stack bench
commonmark > benchmarks
Running 1 benchmarks...
Benchmark benchmark-commonmark: RUNNING...
All
tokenize
tokenize sample.md: OK (0.24s)
3.7 ms ± 221 μs
parse sample.md
commonmark default: OK (0.14s)
47 ms ± 4.6 ms
pathological
nested strong emph
commonmark
1000: OK (0.28s)
1.1 ms ± 43 μs
2000: OK (0.62s)
2.5 ms ± 181 μs
3000: OK (0.44s)
3.4 ms ± 198 μs
4000: OK (0.30s)
4.6 ms ± 213 μs
many emph closers with no openers
commonmark
1000: OK (0.55s)
1.1 ms ± 50 μs
2000: OK (0.28s)
2.1 ms ± 112 μs
3000: OK (0.21s)
3.2 ms ± 300 μs
4000: OK (0.28s)
4.4 ms ± 240 μs
many emph openers with no closers
commonmark
1000: OK (0.26s)
1.0 ms ± 46 μs
2000: OK (0.27s)
2.1 ms ± 144 μs
3000: OK (0.43s)
3.3 ms ± 209 μs
4000: OK (0.29s)
4.3 ms ± 199 μs
many link closers with no openers
commonmark
1000: OK (0.33s)
1.2 ms ± 74 μs
2000: OK (0.35s)
2.6 ms ± 193 μs
3000: OK (0.31s)
4.7 ms ± 275 μs
4000: OK (0.75s)
5.4 ms ± 151 μs
many link openers with no closers
commonmark
1000: OK (0.34s)
1.3 ms ± 48 μs
2000: OK (0.35s)
2.6 ms ± 123 μs
3000: OK (0.16s)
5.1 ms ± 499 μs
4000: OK (0.38s)
5.6 ms ± 261 μs
mismatched openers and closers
commonmark
1000: OK (0.37s)
11 ms ± 624 μs
2000: OK (0.14s)
45 ms ± 4.5 ms
3000: OK (0.30s)
98 ms ± 3.6 ms
4000: OK (0.54s)
179 ms ± 8.1 ms
openers and closers multiple of 3
commonmark
1000: OK (0.19s)
3.1 ms ± 229 μs
2000: OK (0.36s)
12 ms ± 1.0 ms
3000: OK (0.34s)
22 ms ± 1.5 ms
4000: OK (0.61s)
40 ms ± 1.6 ms
link openers and emph closers
commonmark
1000: OK (0.63s)
1.1 ms ± 103 μs
2000: OK (0.32s)
2.4 ms ± 104 μs
3000: OK (0.94s)
3.6 ms ± 196 μs
4000: OK (0.65s)
5.0 ms ± 249 μs
nested brackets
commonmark
1000: OK (0.45s)
6.8 ms ± 464 μs
2000: OK (0.18s)
25 ms ± 1.8 ms
3000: OK (0.17s)
55 ms ± 4.8 ms
4000: OK (0.29s)
96 ms ± 4.5 ms
inline link openers without closers
commonmark
1000: OK (0.29s)
2.1 ms ± 130 μs
2000: OK (0.30s)
4.5 ms ± 251 μs
3000: OK (0.46s)
6.9 ms ± 558 μs
4000: OK (0.63s)
9.8 ms ± 402 μs
repeated pattern '[ (]('
commonmark
1000: OK (0.38s)
1.5 ms ± 55 μs
2000: OK (0.39s)
2.8 ms ± 238 μs
3000: OK (0.29s)
4.3 ms ± 279 μs
4000: OK (0.40s)
6.0 ms ± 554 μs
nested block quotes
commonmark
1000: OK (0.50s)
941 μs ± 23 μs
2000: OK (0.54s)
2.0 ms ± 43 μs
3000: OK (0.21s)
3.2 ms ± 316 μs
4000: OK (0.29s)
4.2 ms ± 223 μs
nested list
commonmark
1000: OK (0.28s)
529 μs ± 24 μs
2000: OK (0.42s)
813 μs ± 47 μs
3000: OK (0.30s)
1.1 ms ± 66 μs
4000: OK (0.36s)
1.3 ms ± 128 μs
nested list 2
commonmark
1000: OK (0.36s)
2.8 ms ± 108 μs
2000: OK (0.42s)
6.3 ms ± 387 μs
3000: OK (1.33s)
10 ms ± 167 μs
4000: OK (0.49s)
15 ms ± 796 μs
backticks
commonmark
1000: OK (0.21s)
406 μs ± 23 μs
2000: OK (0.22s)
862 μs ± 57 μs
3000: OK (0.33s)
1.3 ms ± 57 μs
4000: OK (0.23s)
1.7 ms ± 97 μs
CDATA
commonmark
1000: OK (0.24s)
916 μs ± 43 μs
2000: OK (0.49s)
1.8 ms ± 51 μs
3000: OK (0.38s)
2.9 ms ± 93 μs
4000: OK (0.50s)
3.8 ms ± 93 μs
<?
commonmark
1000: OK (0.40s)
1.5 ms ± 53 μs
2000: OK (0.40s)
3.0 ms ± 95 μs
3000: OK (0.31s)
4.6 ms ± 390 μs
4000: OK (0.22s)
6.2 ms ± 570 μs
<!A
commonmark
1000: OK (0.32s)
1.2 ms ± 55 μs
2000: OK (0.33s)
2.4 ms ± 144 μs
3000: OK (0.48s)
3.7 ms ± 157 μs
4000: OK (0.33s)
4.9 ms ± 202 μs
All 74 tests passed (28.22s)
Benchmark benchmark-commonmark: FINISH
commonmark-extensions> benchmarks
Running 1 benchmarks...
Benchmark benchmark-commonmark-extensions: RUNNING...
All
commonmark +smart: OK (0.78s)
50 ms ± 2.0 ms
commonmark +autolink: OK (0.39s)
53 ms ± 4.1 ms
commonmark +attributes: OK (0.74s)
48 ms ± 1.0 ms
commonmark +pipe_table: OK (0.35s)
47 ms ± 3.0 ms
All 4 tests passed (2.25s)
Benchmark benchmark-commonmark-extensions: FINISH
Completed 2 action(s).
Fixes #136
This works by re-parsing the tokens that come after the link, but only when the end delimiter isn't on a chunk boundary (since that's the only way this problem can happen).
Re-parsing a specific chunk won't work, because the part that needs re-interpreted can span more than one chunk. For example, we can draw the bounds of the erroneous code chunk in this example:
[x](`) <a href="`">
^-----------^
If we re-parse the underlined part in isolation, we'll fix the first link, but won't find the HTML (since the closing angle bracket is in the next chunk).
On the other hand, parsing links, code, and HTML in a single pass would make writing extensions more complicated. For example, LaTeX math is supposed to have the same binding strength as code spans:
$first[$](about)
^------^ this is a math span, not a link
[first]($)$5/8$
^-^ this is an analogue of the original bug
it shouldn't be a math span, but looks like one
@jgm other than the merge conflict in regression.md (a simple rebase), is there anything blocking this fix?
This probably just arrived at a busy time; I will look at. It would help if you could rebase.
Okay, it’s rebased.
Any feedback on this PR?
Sorry, I was on vacation when this posted and it got ignored. I'll try to have a look in the near future!
Do the new comments and variable names help?
Yes, this looks great, thanks!