markdownlint icon indicating copy to clipboard operation
markdownlint copied to clipboard

MD051: Enhance with optional ignore prefix or regex

Open tunetheweb opened this issue 3 years ago • 10 comments

MD051 is a very nice rule addition! However some of the markdown files in our project have dynamically inserted figures which start called fig-1, fig-2. These anchors don't exist in the markdown, but are added as part of our build process.

What do you think about adding an optional config with a prefix or regex of links to ignore?

Something like:

MD051:
  ignore_prefix: "fig"

Or

MD051:
  ignore_prefixes: "fig,somethingelse"

Or

MD051:
  ignore_regex: "^(fig|somethingelse)"

I'd be willing to have a go at a PR for this if this sounds reasonable and have any preference for any of the above or any preferred name/syntax.

tunetheweb avatar Aug 04 '22 09:08 tunetheweb

What about linting after the Markdown files are fully generated by the build? Otherwise you may have broken "fig-" links and won't know it.

DavidAnson avatar Aug 04 '22 16:08 DavidAnson

That is true and certainly a risk. In my case they are built into HTML and and we do lint those. But apparently the HTML linter we use (HTMLHint) is not as good as markdown lint 😄 since we have several broken links that this check has only just surfaced. I could look at expanding that project to possibly have a similar rule to MD051 as an alternative to adding the exception here if you’d prefer not to complicate this code base with an exception option.

tunetheweb avatar Aug 04 '22 16:08 tunetheweb

How many links do there tend to be in a document? Would supporting a list of strings be enough because you could provide a project level configuration that listed "fig-1" to "fig-10"? Or are there hundreds of these in a document?

DavidAnson avatar Aug 04 '22 16:08 DavidAnson

There can be a lot. Here’s an example one if you’re curious: https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/src/content/en/2021/css.md

We also have translations and one thing that’s particularly apparent with this new link is the translators often leave the original links, but translate the headings (which then obviously changes the heading anchor and so breaks the link). MD053 has surfaced a lot of those which is why I’m keen to be able to use this check to prevent that in future. Those and typos in links (or edits after we create links, but then change the heading names) are main use case for me.

Figure links are less of an issue for us - and a more complicated one to solve anyway since the figure link likely exists but could be wrong one if they all shift along by inserting a new figure. But we also tend to link those less anyway as usually talk about the figures just after then so don’t need a link.

tunetheweb avatar Aug 04 '22 17:08 tunetheweb

I may have been unclear. I was asking how many instances of figure links might be in a document. If it is very few, then it would be possible to provide a fixed size list of the first 10 and that would cover you. If there are very many, maybe a regular expression is more relevant. I don't see any matches for "fig-" in that document, so maybe I'm looking for the wrong thing?

DavidAnson avatar Aug 04 '22 19:08 DavidAnson

So there are anything for a few to many figure links. That particular document has 67 figures. You can see the final published version here. In this example none of the figures are referenced by links.

Here’s one where there is a link: https://raw.githubusercontent.com/HTTPArchive/almanac.httparchive.org/main/src/content/en/2021/pwa.md (search for fig-4).

The nature of publication is most figures are talked about directly after the figure so don’t need to reference the figure, but occasionally another figure in the text is referenced and linked.

So I was thinking to allow me to configured a prefix allow list (fig-) or regex, rather than having to list all possible figures that might be referenced by the authors.

tunetheweb avatar Aug 04 '22 21:08 tunetheweb

I'm worried that prefix is not general enough for other scenarios (if there are any?) and that regular expression is harder to work with for many folks.

I also feel kind of like the approach you describe now is quite fragile and may have a bunch of broken figure links already.

So I don't have an approach I like yet.

DavidAnson avatar Aug 04 '22 22:08 DavidAnson

FYI I managed to work around this using inline ignores on the affected lines since, luckily, we don't reference figures that much internally so this is feasible.

I still think it would be useful to have some sort of more generic overrides for dynamically inserted content like this, where the markdown is, in effect, a source file, rather than the final output. I'm know I'm not the only one that does this (though not aware of any others that explicitly link to generated content). I take onboard your above point that it might be better to lint the final output HTML in these cases, though it's also nice to be able to flag items at source (so we get the correct line number), but without the noise of items markdownlint can't be expected to deal with. Maybe inline ignores are the best way of dealing with this but feels a little verbose for regular use, and listing all the possible figures in any MD051 config is almost as verbose. Maybe some kind of regex-lite like D051: ignore: "fig-{int},somethingelse" would be a middle ground between full regex support and listing every permutation?

Anyway, I understand if you'd prefer not to handle this in markdownlink and so wish to close this issue. As I say I've managed to work around it with existing functionality and still benefit from MD051 to help identify a lot of real issues/typos thanks to this new rule.

Thanks again for creating and supporting this tool!

tunetheweb avatar Aug 07 '22 11:08 tunetheweb

Great news! I'll leave this open as a possible enhancement and see if/what other scenarios come up.

DavidAnson avatar Aug 07 '22 17:08 DavidAnson

We hit similar issue. Our project which is held in bitbucket generates README for terraform with DocTor https://github.com/thlorenz/doctoc. It generates table of contents with links in format

[Terraform Documentation](#markdown-header-terraform-documentation)
    - [Requirements](#markdown-header-requirements)

and markdown-linter marks links with markdown-header prefix as invalid. having a way to configure the linter to ignore these prefixes would resolve the problem.

For now I have to disable the rule.

mkrg-capco avatar Dec 21 '22 11:12 mkrg-capco