zola
zola copied to clipboard
Warn about unprocessed Markdown in rendered HTML
Bug Report
Environment
Zola version: 0.13
Background
I recently upgraded from Zola 0.9 to 0.13. Along the way Markdown processing must have changed:
0.9:
0.13:
Since the site is a bit longer and there were no errors it took me a few days to realize this happened.
Expected Behavior
First of all, I'm not even sure if the new Markdown processing is a bug or a feature. I'm assuming it's a feature, and I can respect that authors might have to adjust their page a bit along the way.
However, it would be nice if Zola could warn that something broke that previously worked.
A minimum feature could be to simply warn if the rendered HTML contains [xxx](yyy)
, _xxx_
or *xxx*
sequences, maybe as part of zola check
(maybe even with an extra option zola check --xxx
.
Step to reproduce
Clone https://github.com/ralfbiedert/cheats.rs
, render with 0.13 or 0.9.
Update
On a side note, starting a line with a shortcode {{ note() }}
was what caused the markdown to render differently:
{{ note(note="1") }} Technically `async` transforms following code into anonymous, compiler-generated state machine type; `f()` instantiates that machine. <br>
That's from a change from 0.12 in how shortcodes are integrated I believe. In that case it will replace the shortcode with it with a <pre>
and since it starts with HTML, the Markdown parser ignores the rest.
I'm not sure how to fix that tbh, the alternative was randomly breaking shortcodes depending on the indentation. I believe you can pass the text to the shortcode and use the markdown
filter in there.
A minimum feature could be to simply warn if the rendered HTML contains xxx, xxx or xxx sequences, maybe as part of zola check (maybe even with an extra option zola check --xxx.
That would have to be an option otherwise every blog post talking about Markdown would fail it.
I'm not sure if it would fall under this issue or a separate one, but it'd also be nice to have a warning (or a flag to enable a warning) to alert on malformed shortcodes instead of silently ignoring them.
alert on malformed shortcodes
How do we know whether it's a malformed shortcode though?
I'll do some testing, but I think just at a glance alerting on the error from https://github.com/getzola/zola/blob/master/components/rendering/src/shortcode.rs#L126 might be helpful. Could also do a search afterwards for {%
as was mentioned in the original issue.
Will look into what errors tera throws as well, possibly could address the problem at that level.
I still don't know if there is a good way to do that.
How about zola check --assert-no-strange-patterns
(or similar setting in the config.toml
; name TBD) which via regex checks there are no patterns of type
-
(...)[...]
-
`...`
-
*...*
-
_..._
- whatever a malformed short code looks like,
{{ }
and{ }}
?
where ...
would mostly be alphanumerical and white space. In particular any <>
inside ...
would make the test pass.
That list could be expanded later, and people authoring blogs about Markdown would just not enable that option. I even think the logic could be inverted right from the start, like --ignore-strange-patterns
, since most people write about other things than markdown.
The issue is that you can also have HTML or any kind of script in the Markdown file, doing it via regex is going to be very error prone
I might be missing something, but I've sketched what I mean here:
https://github.com/ralfbiedert/rust_issues/tree/zola_unprocessed_markdown
The idea is to process the generated HTML and search for text nodes with broken patterns. I've tested this with two versions of my site, the clean
one is the vanilla version of my site, and the broken
one has a simulated issue as in my original ticket.
https://github.com/ralfbiedert/rust_issues/commit/6cd1f1b370ba908f5f8d5893739978c897898f12#diff-42cb6807ad74b3e201c5a7ca98b911c5fa08380e942be6e4ac5807f8377f87fcR65-R68 are all valid. it could be the user decided to put <p>*some text*</p>
in their markdown.
Or even a shortcode outputting some JS like
/* does foo */
function() {}
wouldn't pass that test. I don't think there is a way to detect unprocessed content reliably so I wouldn't try it.
I buy that I might have a niche use case, that the effort of implementing these checks is (currently) high and that they shouldn't be enabled by default.
However, if you ever implemented a modular checking system where these things could be added and enabled one-by-one, then I'd happily enable most of them, as they seemingly (compare link above) would have been an effective measure preventing some bugs on my site.
Now that I think about it ... I think of these more of a "clippy" system of quality lints to opt into, which may have false positives, but people can adjust them to the needs of their site. But anyway, no worries, it would just be a nice to have feature for me, not a critical one.
Can that be done with ripgrep on the public/ directory in the meantime? The issue with false positives linting is that you need a way to skip some so that means adding some new concept to the markdown parser just for that.
Can that be done with ripgrep on the public/ directory in the meantime?
Yes, I think it must to be done on public
as it needs both an HTML parser like visdom
to walk text nodes, and regex
to check node content (at least if you want to minimize false positives).
so that means adding some new concept to the markdown parser just for that.
I don't really know zola architecture, but extending lint approach it would be fantastic if zola could also check
the generated output, which could allow you to:
- verify the HTML is well formed (I've shipped sites with missing closing tags few times already)
- local
src
ormeta
attributes point to existing resources (same, esp. for more subtle CSS + fonts)
In other words, there might be some features / correctness benefits for not just looking at the markdown files.