Link checker fails internal links
The link checker fails all links in recent PRs, see e.g. https://github.com/HSF/hsf.github.io/pull/1045 :+1:
All the link check failures seem to be the same kind as yesterday and all false positives. Maybe @klieret has an idea - it seems that {{site.baseurl}} is expanding to an empty string and then the URL checker is trying for an absolute path, instead of a relative one.
We had to slightly tweak the link checker before to resolve internal links (see this change).
~~I wonder if something goes wrong with the regular expression that is used there to cause this...~~
Hmm, I have no idea why this isn't working.
I confirmed with a find statement for example for the existence of
/__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html
or even with file:
/__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html: HTML document, UTF-8 Unicode text, with very long lines
But then later the link checker complains that
2022-01-18T17:37:10.3403364Z [✖] /__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html → Status: 400 [Error: ENOENT: no such file or directory, access '/__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html'] {
2022-01-18T17:37:10.3403782Z errno: -2,
2022-01-18T17:37:10.3404010Z code: 'ENOENT',
2022-01-18T17:37:10.3404232Z syscall: 'access',
2022-01-18T17:37:10.3404597Z path: '/__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html'
2022-01-18T17:37:10.3404854Z }
It's as if the markdown link checker can't see the files generated by the previous steps, but that shouldn't be....
This is in fact the same issue that is reported here: https://github.com/tcort/markdown-link-check/issues/96 (but it doesn't provide much information that we don't know already)
I'm currently out of ideas @graeme-a-stewart
What we could do as a non-perfect solution is to use the replacement pattern to prefix https://hepsoftwarefoundation.org. This would fail if you add a new markdown file and then directly link to it in the same PR, but should work for all the other links.
Just spitballing here, would it help if instead of
"pattern": "^\\s*{{\\s*site.baseurl\\s*}}/(.*)",
"replacement": "/_site/$1"
we made it a relative path, viz.
"pattern": "^\\s*{{\\s*site.baseurl\\s*}}/(.*)",
"replacement": "./_site/$1"
Nope, that also doesn't work:
[✖] _site/training/curriculum.html → Status: 400 [Error: ENOENT: no such file or directory, access '/github/workspace/_workinggroups/_site/training/curriculum.html'] {
errno: -2,
code: 'ENOENT',
syscall: 'access',
path: '/github/workspace/_workinggroups/_site/training/curriculum.html'
}
In fact, it looks like the link checker actually changes directories to the current file. I don't think it did that in the past. This is probably made the previous solution fail.
Thanks @klieret. So do we need to construct the full absolute path then? Slipping a $(pwd) in there somewhere? In the Github Action CI we do know what the absolute path is, right?
The thing is, I already tried with full absolute paths (it's static, so we can just had code it) and it failed as well. That's what's confusing me.
I just tried with a local installation of markdown-link-check and there it works with absolute paths.
Let me try again on the gh action
No, absolute paths don't work either. Reproduced my previous comment again.
(Though linked to this issue, the merged PR is only a partial fix, so I'm keeping this open)
Isn't this issue solved now?
There should still be one loophole, though it doesn't seem to come up often in practice:
This is only a half-hearted fix: It will fail if you create a new page and link to it before it is published.
(from my notes to https://github.com/HSF/hsf.github.io/pull/1051)
Note that this edge case is not triggered by e.g., new GSoC pages, because there the interlinking (to project/organization etc.) is generated from the yaml frontmatter, so the markdown link checker doesn't find anything to check.
Though looking at this again, I wonder if we could use absolute local paths for the replacement and then set baseURL, projectBaseURL to ensure that they are not with respect to the base directory but really to the root of the file system. I think this might be a setting that we missed previously.
I want to contribute to this issue .please assign me if have seen this then let me know
Superseded by https://github.com/HSF/hsf.github.io/issues/1559.