hsf.github.io icon indicating copy to clipboard operation
hsf.github.io copied to clipboard

Link checker fails internal links

Open klieret opened this issue 3 years ago • 15 comments

The link checker fails all links in recent PRs, see e.g. https://github.com/HSF/hsf.github.io/pull/1045 :+1:

All the link check failures seem to be the same kind as yesterday and all false positives. Maybe @klieret has an idea - it seems that {{site.baseurl}} is expanding to an empty string and then the URL checker is trying for an absolute path, instead of a relative one.

klieret avatar Jan 18 '22 16:01 klieret

We had to slightly tweak the link checker before to resolve internal links (see this change).

~~I wonder if something goes wrong with the regular expression that is used there to cause this...~~

klieret avatar Jan 18 '22 16:01 klieret

Hmm, I have no idea why this isn't working.

I confirmed with a find statement for example for the existence of

/__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html

or even with file:

/__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html: HTML document, UTF-8 Unicode text, with very long lines

But then later the link checker complains that

2022-01-18T17:37:10.3403364Z [✖] /__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html → Status: 400 [Error: ENOENT: no such file or directory, access '/__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html'] {
2022-01-18T17:37:10.3403782Z   errno: -2,
2022-01-18T17:37:10.3404010Z   code: 'ENOENT',
2022-01-18T17:37:10.3404232Z   syscall: 'access',
2022-01-18T17:37:10.3404597Z   path: '/__w/hsf.github.io/hsf.github.io/_site/training/curriculum.html'
2022-01-18T17:37:10.3404854Z }

It's as if the markdown link checker can't see the files generated by the previous steps, but that shouldn't be....

klieret avatar Jan 18 '22 17:01 klieret

This is in fact the same issue that is reported here: https://github.com/tcort/markdown-link-check/issues/96 (but it doesn't provide much information that we don't know already)

klieret avatar Jan 18 '22 17:01 klieret

I'm currently out of ideas @graeme-a-stewart

What we could do as a non-perfect solution is to use the replacement pattern to prefix https://hepsoftwarefoundation.org. This would fail if you add a new markdown file and then directly link to it in the same PR, but should work for all the other links.

klieret avatar Jan 18 '22 17:01 klieret

Just spitballing here, would it help if instead of

    "pattern": "^\\s*{{\\s*site.baseurl\\s*}}/(.*)",
    "replacement": "/_site/$1"

we made it a relative path, viz.

    "pattern": "^\\s*{{\\s*site.baseurl\\s*}}/(.*)",
    "replacement": "./_site/$1"

graeme-a-stewart avatar Jan 21 '22 09:01 graeme-a-stewart

Nope, that also doesn't work:

[✖] _site/training/curriculum.html → Status: 400 [Error: ENOENT: no such file or directory, access '/github/workspace/_workinggroups/_site/training/curriculum.html'] {
  errno: -2,
  code: 'ENOENT',
  syscall: 'access',
  path: '/github/workspace/_workinggroups/_site/training/curriculum.html'
}

In fact, it looks like the link checker actually changes directories to the current file. I don't think it did that in the past. This is probably made the previous solution fail.

klieret avatar Jan 21 '22 14:01 klieret

Thanks @klieret. So do we need to construct the full absolute path then? Slipping a $(pwd) in there somewhere? In the Github Action CI we do know what the absolute path is, right?

graeme-a-stewart avatar Jan 21 '22 15:01 graeme-a-stewart

The thing is, I already tried with full absolute paths (it's static, so we can just had code it) and it failed as well. That's what's confusing me.

I just tried with a local installation of markdown-link-check and there it works with absolute paths.

Let me try again on the gh action

klieret avatar Jan 21 '22 15:01 klieret

No, absolute paths don't work either. Reproduced my previous comment again.

klieret avatar Jan 21 '22 15:01 klieret

(Though linked to this issue, the merged PR is only a partial fix, so I'm keeping this open)

klieret avatar Jan 25 '22 11:01 klieret

Isn't this issue solved now?

hegner avatar Feb 14 '23 09:02 hegner

There should still be one loophole, though it doesn't seem to come up often in practice:

This is only a half-hearted fix: It will fail if you create a new page and link to it before it is published.

(from my notes to https://github.com/HSF/hsf.github.io/pull/1051)

klieret avatar Feb 14 '23 15:02 klieret

Note that this edge case is not triggered by e.g., new GSoC pages, because there the interlinking (to project/organization etc.) is generated from the yaml frontmatter, so the markdown link checker doesn't find anything to check.

klieret avatar Feb 14 '23 16:02 klieret

Though looking at this again, I wonder if we could use absolute local paths for the replacement and then set baseURL, projectBaseURL to ensure that they are not with respect to the base directory but really to the root of the file system. I think this might be a setting that we missed previously.

klieret avatar Feb 14 '23 16:02 klieret

I want to contribute to this issue .please assign me if have seen this then let me know

testgithubsonika avatar Aug 27 '23 20:08 testgithubsonika

Superseded by https://github.com/HSF/hsf.github.io/issues/1559.

eduardo-rodrigues avatar Jul 04 '24 14:07 eduardo-rodrigues