rustdoc: check parsing diffs between pulldown-cmark 0.9.6 and 0.10
This commit is not meant to be merged as-is. It's meant to run in Crater, so that we can estimate the impact of bumping to the new version of the markdown parser.
r? rustdoc
The job x86_64-gnu-llvm-16 failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
GITHUB_ACTION=__run_7
GITHUB_ACTIONS=true
GITHUB_ACTION_REF=
GITHUB_ACTION_REPOSITORY=
GITHUB_ACTOR=notriddle
GITHUB_API_URL=https://api.github.com
GITHUB_BASE_REF=master
GITHUB_ENV=/home/runner/work/_temp/_runner_file_commands/set_env_d841a48d-ece5-40c4-a461-5def615ae6a1
GITHUB_EVENT_NAME=pull_request
GITHUB_EVENT_NAME=pull_request
GITHUB_EVENT_PATH=/home/runner/work/_temp/_github_workflow/event.json
GITHUB_GRAPHQL_URL=https://api.github.com/graphql
GITHUB_HEAD_REF=notriddle/bump-pulldown-cmark
GITHUB_JOB=pr
GITHUB_PATH=/home/runner/work/_temp/_runner_file_commands/add_path_d841a48d-ece5-40c4-a461-5def615ae6a1
GITHUB_REF=refs/pull/121659/merge
GITHUB_REF_NAME=121659/merge
GITHUB_REF_PROTECTED=false
---
GITHUB_SERVER_URL=https://github.com
GITHUB_SHA=676afa8e8fcda392872444abd520103e6b264b0c
GITHUB_STATE=/home/runner/work/_temp/_runner_file_commands/save_state_d841a48d-ece5-40c4-a461-5def615ae6a1
GITHUB_STEP_SUMMARY=/home/runner/work/_temp/_runner_file_commands/step_summary_d841a48d-ece5-40c4-a461-5def615ae6a1
GITHUB_TRIGGERING_ACTOR=notriddle
GITHUB_WORKFLOW_REF=rust-lang/rust/.github/workflows/ci.yml@refs/pull/121659/merge
GITHUB_WORKFLOW_SHA=676afa8e8fcda392872444abd520103e6b264b0c
GITHUB_WORKSPACE=/home/runner/work/rust/rust
GOROOT_1_19_X64=/opt/hostedtoolcache/go/1.19.13/x64
---
#12 writing image sha256:eff01709df2a20614239476b37ee9a544a161d621a3a3eeb34c04508ef288e36 done
#12 naming to docker.io/library/rust-ci done
#12 DONE 10.0s
##[endgroup]
Setting extra environment values for docker: --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-16]
##[group]Clock drift check
local time: Tue Feb 27 01:26:42 UTC 2024
network time: Tue, 27 Feb 2024 01:26:42 GMT
network time: Tue, 27 Feb 2024 01:26:42 GMT
##[endgroup]
sccache: Starting the server...
##[group]Configure the build
configure: processing command line
configure:
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-16', '--enable-llvm-link-shared', '--set', 'rust.thin-lto-import-instr-limit=10', '--set', 'change-id=99999999', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'build.optimized-compiler-builtins', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling']
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-16/bin/llvm-config
configure: llvm.link-shared := True
configure: rust.thin-lto-import-instr-limit := 10
configure: change-id := 99999999
---
##[endgroup]
Testing GCC stage1 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Compiling y v0.1.0 (/checkout/compiler/rustc_codegen_gcc/build_system)
Finished release [optimized] target(s) in 1.21s
Running `/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-codegen/x86_64-unknown-linux-gnu/release/y test --use-system-gcc --use-backend gcc --out-dir /checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools/cg_gcc --release --no-default-features --mini-tests --std-tests`
Using system GCC
Using system GCC
[BUILD] example
[AOT] mini_core_hello_world
/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-tools/cg_gcc/mini_core_hello_world
abc
---
---- [rustdoc] tests/rustdoc/footnote-definition-without-blank-line-100638.rs stdout ----
error: rustdoc failed!
status: exit status: 1
command: RUSTC_ICE="0" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustdoc" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/rustdoc/footnote-definition-without-blank-line-100638/auxiliary" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/rustdoc/footnote-definition-without-blank-line-100638" "--deny" "warnings" "/checkout/tests/rustdoc/footnote-definition-without-blank-line-100638.rs" "-A" "internal_features"
--- stderr -------------------------------
Build completed unsuccessfully in 0:17:16
error: unportable markdown
##[error] --> /checkout/tests/rustdoc/footnote-definition-without-blank-line-100638.rs:5:11
##[error] --> /checkout/tests/rustdoc/footnote-definition-without-blank-line-100638.rs:5:11
|
5 | //! [^1]: Footnote A.
| ___________^
6 | | //! [^2]: Footnote B.
|
|
= help: old parser sees SoftBreak, new sees End(Paragraph)
= note: `#[deny(rustdoc::unportable_markdown)]` on by default
error: unportable markdown
##[error] --> /checkout/tests/rustdoc/footnote-definition-without-blank-line-100638.rs:6:5
|
|
6 | //! [^2]: Footnote B.
| _____^
7 | | //! [^3]: Footnote C.
|
|
= help: new parser sees Start(Paragraph), old sees Start(FootnoteDefinition(Borrowed("2")))
error: aborting due to 2 previous errors
------------------------------------------
---- [rustdoc] tests/rustdoc/issue-107995.rs stdout ----
error: rustdoc failed!
status: exit status: 1
command: RUSTC_ICE="0" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustdoc" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/rustdoc/issue-107995/auxiliary" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/rustdoc/issue-107995" "--deny" "warnings" "/checkout/tests/rustdoc/issue-107995.rs" "-A" "internal_features"
--- stderr -------------------------------
error: unportable markdown
##[error] --> /checkout/tests/rustdoc/issue-107995.rs:7:5
|
|
7 | /// A foo, see also [ bar`]
|
|
= help: old parser sees Start(Link(ShortcutUnknown, Borrowed(""), Borrowed("bar"))), new sees End(Paragraph)
= note: `#[deny(rustdoc::unportable_markdown)]` on by default
error: unportable markdown
##[error] --> /checkout/tests/rustdoc/issue-107995.rs:13:1
|
|
13 | / #[doc = "line ["]
14 | | #[doc = "Path"]
15 | | #[doc = "] line"]
|
|
= help: new parser sees SoftBreak, old sees nothing
error: unportable markdown
##[error] --> /checkout/tests/rustdoc/issue-107995.rs:13:1
|
|
13 | / #[doc = "line ["]
14 | | #[doc = "Path"]
15 | | #[doc = "] line"]
|
|
= help: new parser sees SoftBreak, old sees SoftBreak
error: unportable markdown
##[error] --> /checkout/tests/rustdoc/issue-107995.rs:20:5
|
|
20 | /// [ `Path`]
|
|
= help: new parser sees End(Paragraph), old sees Start(Link(ShortcutUnknown, Borrowed(""), Borrowed("Path")))
error: unportable markdown
##[error] --> /checkout/tests/rustdoc/issue-107995.rs:25:5
|
|
25 | /// [ Path`]
|
|
= help: new parser sees End(Paragraph), old sees Start(Link(ShortcutUnknown, Borrowed(""), Borrowed("Path")))
error: aborting due to 5 previous errors
------------------------------------------
---- [rustdoc] tests/rustdoc/task-lists.rs stdout ----
error: rustdoc failed!
status: exit status: 1
command: RUSTC_ICE="0" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustdoc" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/rustdoc/task-lists/auxiliary" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/rustdoc/task-lists" "--deny" "warnings" "/checkout/tests/rustdoc/task-lists.rs" "-A" "internal_features"
--- stderr -------------------------------
error: unportable markdown
##[error] --> /checkout/tests/rustdoc/task-lists.rs:12:7
|
|
12 | //! - [ ] a
|
|
= help: new parser sees TaskListMarker(false), old sees TaskListMarker(false)
= note: `#[deny(rustdoc::unportable_markdown)]` on by default
error: unportable markdown
##[error] --> /checkout/tests/rustdoc/task-lists.rs:13:7
|
|
13 | //! - [x] b
|
|
= help: new parser sees TaskListMarker(true), old sees TaskListMarker(true)
error: aborting due to 2 previous errors
------------------------------------------
The job x86_64-gnu-tools failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
GITHUB_ACTION=__run_7
GITHUB_ACTIONS=true
GITHUB_ACTION_REF=
GITHUB_ACTION_REPOSITORY=
GITHUB_ACTOR=notriddle
GITHUB_API_URL=https://api.github.com
GITHUB_BASE_REF=master
GITHUB_ENV=/home/runner/work/_temp/_runner_file_commands/set_env_114ccbab-9101-4fbe-814d-a1be5a776b8b
GITHUB_EVENT_NAME=pull_request
GITHUB_EVENT_NAME=pull_request
GITHUB_EVENT_PATH=/home/runner/work/_temp/_github_workflow/event.json
GITHUB_GRAPHQL_URL=https://api.github.com/graphql
GITHUB_HEAD_REF=notriddle/bump-pulldown-cmark
GITHUB_JOB=pr
GITHUB_PATH=/home/runner/work/_temp/_runner_file_commands/add_path_114ccbab-9101-4fbe-814d-a1be5a776b8b
GITHUB_REF=refs/pull/121659/merge
GITHUB_REF_NAME=121659/merge
GITHUB_REF_PROTECTED=false
---
GITHUB_SERVER_URL=https://github.com
GITHUB_SHA=cdba3adfb773c20cd5242c0ec6f297f36db52e82
GITHUB_STATE=/home/runner/work/_temp/_runner_file_commands/save_state_114ccbab-9101-4fbe-814d-a1be5a776b8b
GITHUB_STEP_SUMMARY=/home/runner/work/_temp/_runner_file_commands/step_summary_114ccbab-9101-4fbe-814d-a1be5a776b8b
GITHUB_TRIGGERING_ACTOR=notriddle
GITHUB_WORKFLOW_REF=rust-lang/rust/.github/workflows/ci.yml@refs/pull/121659/merge
GITHUB_WORKFLOW_SHA=cdba3adfb773c20cd5242c0ec6f297f36db52e82
GITHUB_WORKSPACE=/home/runner/work/rust/rust
GOROOT_1_19_X64=/opt/hostedtoolcache/go/1.19.13/x64
---
Documenting test_docs v0.1.0 (/checkout/tests/rustdoc-gui/src/test_docs)
error: unportable markdown
##[error] --> lib.rs:502:5
|
502 | /// <sub id="codeblock-sub-1">
503 | | ///
504 | | /// ```
505 | | /// one
506 | | /// ```
506 | | /// ```
| |_______^
|
= help: old parser sees Start(CodeBlock(Fenced(Borrowed("")))), new sees Start(HtmlBlock)
= note: `#[deny(rustdoc::unportable_markdown)]` on by default
error: unportable markdown
##[error] --> lib.rs:71:9
|
|
71 | /// <div id="doc-warning-1" class="warning">this is a warning</div>
72 | | ///
73 | | /// done
| |____________^
|
|
= help: old parser sees Start(Paragraph), new sees Start(HtmlBlock)
error: could not document `test_docs`
Caused by:
Caused by:
process didn't exit successfully: `/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustdoc --edition=2018 --crate-type lib --crate-name test_docs lib.rs -o /checkout/obj/build/x86_64-unknown-linux-gnu/test/rustdoc-gui/doc --cfg 'feature="default"' --cfg 'feature="some-feature"' --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat -C metadata=5e54571322b3d3f6 -L dependency=/checkout/obj/build/x86_64-unknown-linux-gnu/test/rustdoc-gui/debug/deps --crate-version 0.1.0` (exit status: 1)
failed to document `/checkout/tests/rustdoc-gui/src/test_docs`
Cannot run rustdoc-gui tests
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Build completed unsuccessfully in 0:00:26
local time: Tue Feb 27 05:24:16 UTC 2024
I think it looks ready for a crater run.
@bors try
:hourglass: Trying commit 5b1ebc2535d1d0d4bb9099c2fd6eb318a515b978 with merge 223112bf1ca31911a6475910c77b36bfa127d5f8...
@craterbot run name=pr-121659-bump-pulldown-cmark mode=rustdoc
:rotating_light: Error: missing start toolchain
:sos: If you have any trouble with Crater please ping @rust-lang/infra!
:information_source: Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more
@craterbot run name=pr-121659-bump-pulldown-cmark mode=rustdoc start=master#ef324565d071c6d7e2477a195648549e33d6a465 end=try#223112bf1ca31911a6475910c77b36bfa127d5f8
:ok_hand: Experiment pr-121659-bump-pulldown-cmark created and queued.
:mag: You can check out the queue and this experiment's details.
:information_source: Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more
:sunny: Try build successful - checks-actions
Build commit: 223112bf1ca31911a6475910c77b36bfa127d5f8 (223112bf1ca31911a6475910c77b36bfa127d5f8)
:construction: Experiment pr-121659-bump-pulldown-cmark is now running
:information_source: Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more
:tada: Experiment pr-121659-bump-pulldown-cmark is completed!
:bar_chart: 267 regressed and 3 fixed (422032 total)
:newspaper: Open the full report.
:warning: If you notice any spurious failure please add them to the blacklist! :information_source: Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more
Roughly categorizing what I'm seeing here. The specific crates are in this Gist:
(P)roblems: docs that are broken by this change
P1: unintended strikethrough
https://github.com/pulldown-cmark/pulldown-cmark/pull/648 allows abso~fricking~lutely to render as abso~fricking~lutely. It makes pulldown-cmark agree with cmark-gfm.
This change causes 39 docs to have strikethroughs where it looks like the author intended literal tildes.
P2: Unintended block quote
https://github.com/pulldown-cmark/pulldown-cmark/pull/675 makes block quotes, introduced by >, consistent between paragraph interruption and starting after a blank line. It's pretty unambiguous that this is a bug fix.
Causes 24 crates to appear wrong.
P3: Footnote reference immediately after link or another footnote
https://github.com/pulldown-cmark/pulldown-cmark/pull/773
The way that pulldown-cmark ignores the apparent footnote reference in [something][^foot] is the same way GFM does it, and it's the easiest way to parse this given everything else, but it's weird.
Causes 16 crates to appear wrong.
P4: block nested inside footnote definition without indenting
https://github.com/pulldown-cmark/pulldown-cmark/pull/654
This is, basically, the expected outcome of GFM-compatible footnote parsing. The syntax now parses the same way GitHub does, which is not the way rustdoc used to.
Causes 6 crates to appear wrong.
P5: emphasis does not match what author expects
In a text like __imp_(_), commonmark rules say that the result should be _imp(_), but I'm pretty sure the author intended those to be literal underscores. In the old parser, they were.
Causes 2 crates to appear wrong.
P6: Single | does not continue table
This isn't what GitHub does, and the new parser aligns closer with GitHub, but some authors have used this as a way to put dividers in their table:
| header | two |
|--------|-----|
| item | n |
| item | e |
|
| section | l |
| two | o |
Causes 6 crates to appear wrong.
P7: Table is required to have a valid header line
This is not a table according to GitHub, but the old parser would render it as one sometimes.
| first | second |
| third | fourth |
| fifth | sixth |
The header line also needs to have at least one hyphen in each cell, so this isn't allowed either.
| first ||
|--------||
| second ||
Causes 4 crates to appear wrong.
P8: unintended link definition
This is a link definition in the new parser, but not the old one.
[Self::method()]:
frobnicates
Causes 1 crate to appear wrong.
(F)ixes: docs that are actually render better with the new parser than the old one
F1: [^x] in the old parser rendered a broken footnote link when no footnote was intended or defined
https://github.com/pulldown-cmark/pulldown-cmark/pull/654
Usually, they're trying to write regex inverted character classes, but they come out as footnotes.
This fix also makes things consistent with GitHub, and the fact that it's not breaking anyone's docs makes me happy.
Fixes 39 crates
F2: writing 1. alone on a line in the middle of a paragraph shouldn't start a list
https://github.com/pulldown-cmark/pulldown-cmark/pull/681
It used to do the wrong thing with this:
Test paragraph with a count of
1.
Fixes 6 crates
F3: tables interrupt paragraphs
https://github.com/pulldown-cmark/pulldown-cmark/pull/653
This makes things consistent with GitHub, and fixes NN docs. Not likely to be a problem, because you kinda have to try really hard to make something look like a table.
Fixes 37 crates
F4: ASCII art misinterpreted as list
Since asterisks on their own no longer count as lists when they interrupt paragraphs, some things that were never intended as list markers stop being seen as such.
Fixes 1 crate
F5: footnote has nested, indented children or lazy continuation
This is the flip side of P4, where the author actually wanted it to be parsed the way github parses it.
Fixes 2 crates
F6: Link definition is seen in new parser where old parser did not
The old parser didn't properly recognize link reference definitions when they were right after code blocks.
Fixes 1 crates
F7: block structure and inline structure are separated better in new parser
Consider this example:
> [my link](https://example.com "Example web site
> run by the IETF")
The > on the second line shouldn't go in there, and now it doesn't.'
Fixes 2 crates
F8: a table row followed by two spaces isn't a hard break
A very specific bug that only seems to show up when a paragraph is nested inside a list. The new parser fixes it, it only affects 1 crate, and the change makes the generated docs look better.
Fixes 2 crates
F9: footnote with <autolink> was misparsed as link def
The old parser thought this was a link definition:
[^0]: <https://example.com>
The new parser sees it as a footnote, which is also how github does it.
Fixes 1 crate
F10: indented or spaced link definition still counts
This document:
[first]: https://example.com
[second]: https://example.com
[first]
[second]
The old parser ignored the definition of [second], so it only saw one link, but the new one doesn't, so it sees two links. This also agrees with the reference implementation.
The new parser also trims spaces, so [x ] and [x] are the same thing.
Fixes 2 crates
F11: intra-word strikethrough
This is the flip side of P1, where the author apparently intended to write intra-word strikeout.
Fixes 1 crate
(Q)uestionable cases, where the docs were broken before and are still broken now
Q1: crate author wrote ASCII art, or some language that isn't CommonMark, in their doc comments
This happened on N crates, and produces terrible results in both parsers. I'm not bothering to categorize them by which change to pulldown-cmark causes them to render differently.
There are 30 crates that do this
Q2: links with mismatched parens go from being broken links to not being links at all
https://github.com/pulldown-cmark/pulldown-cmark/pull/738
There are 4 crates that do this
Q3: incorrect trim in block doc comment
Block comments are supposed to be written like this:
/**
* first
* second
* third
*/
^^ these two characters are trimmed
If you get this wrong, the text that you intended to be a paragraph gets turned into a long unordered list instead, because trim is computed by checking every line for a common prefix.
It shows up here because incorrectly trimmed block doc comments often have asterisks with no text after them, and https://github.com/pulldown-cmark/pulldown-cmark/pull/681 changes it from a list to plain text.
There are 8 crates that do this
Q4: numbered list that starts with zero and has two ones
0. do stuff
1.
2. do more stuff
3. do more stuff
The 0 isn't a valid list item in commonmark, so that first line is a paragraph. The divergence between the new and old parser is on the second line, but I'm not sure what the intended result by the crate author was? I think it's supposed to be a list starting at zero, but it doesn't work in the old version or the new one.
There is 1 crate that does this
Q5: incorrect footnote definition markup
The parser used to return a dangling footnote reference, but the new one doesn't. Both saw the intended footnote definition as invalid.
There are 18 crates that do this
Q6: block quotes written at start of line
Try writing block quotes like this:
>
First paragraph
>
Second paragraph
I think used to work in GitHub-flavored markdown (it works in Pandoc-flavored markdown now), and seems to be what the author intended in the 1 crate that hits this case, but it doesn't work on GitHub now, and it never worked right in rustdoc (though the exact way it fails changed, which is what brought it to my attention).
There is 1 crate that does this
(S)purious cases where the lint fires and it shouldn't
S1: loose task list
The lint I wrote fires false positives for task lists. The event stream from the parser is different, thanks to https://github.com/pulldown-cmark/pulldown-cmark/pull/558, but they're detected in the same cases.
There are 7 crates that hit this
S2: minor change in HTML blocks
The indentation is handled differently, but will usually generate the same rendered result.
There are 3 crates that hit this
S3: two spaces on the last line of a paragraph
Strictly speaking, this is a different parse result. It will produce slight differences in the spacing.
There is 1 crate that hits this
S4: spans are different when emphasis wraps code
This causes the lint to claim there's a problem when there isn't.
There is 1 crate that hits this
Thanks for the very detailed report! What do you plan to do from this?
Looking at some of the Ps and Qs with more than 10 matches (I'm just going to be happy at the Fixes with more than 10 matches).
P1: I'm making a PR against pulldown-cmark to limit single-tilde strikethroughs to the same rules as underscores. GitHub might render first~second third~fourth as first~second third~fourth, but markdown-it, the parser used in Discourse, does not, and neither does commonmark-hs, the parser used in Pandoc "GFM" mode. https://github.com/pulldown-cmark/pulldown-cmark/pull/864
P2: This should be a lint. The alternative would be willfully violating the spec, which I'd rather not do. The heuristic would be: if the > is not followed by a space, then suggest that the user either add a space or escape the > with a backslash.
P3: Not sure if this should be a lint or if should just be fixed.
Q1: We can't help you if you aren't even trying to write useful Markdown.
Q5: We should detect and warn about [^WHATEVER]-like footnotes with no corresponding definition that aren't backslash escaped. The trigger is the same as P3, but the suggestion would differ based on sniffing around for valid footnote defs or link refs right next to the footnote.
The solution for the spurious problems is to write lints for these specific problems instead of linting on every spot where the parsers disagree. It's also valuable to avoid setting off The Everything's Okay Alarm when their docs are fixed.