linkinator icon indicating copy to clipboard operation
linkinator copied to clipboard

All markdown files skipped when I make unrelated changes to skip pattern

Open chalin opened this issue 2 years ago • 2 comments

I'm running under macOS, with these versions:

$ nvm current
v16.13.0
$ npm --version
8.4.0
$ npm ls linkinator                                                                 
opentelemetry-specification@ /Users/chalin/git/lf/open-telemetry/opentelemetry-specification
└── [email protected]

I'm using linkinator to check markdown files in a repo, something like this:

$ npx linkinator '**/*.md' --markdown --skip '^https|node_modules' --verbosity error
🏊‍♂️ crawling **/*.md
[0] http://localhost:9411/api/v2/spans%22
specification/sdk-environment-variables.md
  [0] http://localhost:9411/api/v2/spans%22
ERROR: Detected 1 broken links. Scanned 137 links in 6.076 seconds.

That works great.

Now, I'd like to skip all links starting with either http or https so I add ? to the skip regex:

$ npx linkinator '**/*.md' --markdown --skip '^https?|node_modules' --verbosity error
🏊‍♂️ crawling **/*.md
🤖 Successfully scanned 0 links in 0.186 seconds.

If I use verbosity info I see that it is now skipping all files. Why? Is there something I'm missing?

It'll skip all files even when I try with these regexs:

  • http|https|node_modules
  • http|https
  • https?

Thanks!

chalin avatar Feb 23 '22 20:02 chalin

Heh, so this is a kind of funny problem. Linkinator works by mounting the local folder you're scanning as a static HTTP server, so most likely all links are getting picked up by this regex. Could you share a little more about the use case? What specifically are you trying to skip?

JustinBeckwith avatar May 08 '22 05:05 JustinBeckwith

Could you share a little more about the use case?

I'm looking for a new link checker to use over the markdown files in https://github.com/open-telemetry/opentelemetry-specification.

What specifically are you trying to skip?

Specifically, I'm trying to skip all

  • external links
  • explicit links to http://localhost

in the markdown files in that directory.

Thanks for getting back to me. Sorry for being late to respond. Should I open another issue, or will you reopen this one?

chalin avatar Aug 14 '22 20:08 chalin

Here's another related question. Consider running linkinator on this repo:

# /workspace/linkinator (main) 
$ npx linkinator '**/*.md' --markdown --skip '^https|node_modules' --verbosity error
🏊‍♂️ crawling **/*.md
[404] test/fixtures/directoryIndex/dir1/
[404] test/fixtures/directoryIndex/dir2/
[404] test/fixtures/rewrite/NOTLICENSE.md
test/fixtures/directoryIndex/README.md
  [404] test/fixtures/directoryIndex/dir1/
  [404] test/fixtures/directoryIndex/dir2/
test/fixtures/rewrite/README.md
  [404] test/fixtures/rewrite/NOTLICENSE.md
ERROR: Detected 3 broken links. Scanned 16 links in 1.295 seconds.

Question: How would I add a pattern to skip all links whose paths start with test?

Note that ^test doesn't work:

# /workspace/linkinator (main) 
$ npx linkinator '**/*.md' --markdown --skip '^https|node_modules|^test' --verbosity error
🏊‍♂️ crawling **/*.md
[404] test/fixtures/directoryIndex/dir1/
...

As you hinted hinted at earlier, it seems that the skip pattern ^http matches all links, including those whose href is just a path (without a protocol).

chalin avatar Aug 15 '22 14:08 chalin

Looked at the code and figured it out, thanks! Closing.

chalin avatar Aug 16 '22 08:08 chalin

Specifically, I'm trying to skip all

* external links

* explicit links to `http://localhost`

@chalin did you find how to do this?

danieleds avatar Oct 04 '22 14:10 danieleds

Hi @danieleds. This is value I ended up using for the skip parameter:

  • ^http://localhost:9411|/node_modules/|^https://github\\.com/some-org/some-project/(issues|pull)

I hope that this helps.

chalin avatar Oct 13 '22 20:10 chalin

Was this fixed or you folks are working on this with the help of a work around only?

anshulsahni avatar Feb 14 '23 09:02 anshulsahni