lychee
lychee copied to clipboard
Support setting a base for absolute links
By default, lychee doesn't seem to check absolute links (/
-prefixed), except when setting the --base
argument (this is weird by itself, since --base
should be for the resolution of absolute link as per documentation).
It would be great to:
- check absolute links by default, using the directory that lychee is run from as the root
- have a
--root
argument that works like--base
but for absolute links
Hi, we decided to not check absolute links by default, because it could generate a lot of false-positives when assuming the current directory as a root. For instance, many static site generators put files into a subfolder like public
or dist
and in this case, assuming using the directory that lychee is run from as the root would not work.
--base
can be a base url (e.g. https://example.com) or a directory path (acting like the --root
you proposed). Setting both would be confusing. When to use the url and when the path when encountering an absolute link?
Hey Mathias! Thanks for the prompt answer!
There might a bit of a confusion, my suggestion is not about path vs URLs. Here's what I propose:
- both
--base
and--root
can be set to either a path or an url - when encoutering a relative path, prepend the value of
--base
- when encountering an absolute path (starting with
/
, but nothttp://
,file://
, etc), prepend the value of--root
- only check absolute paths if the
--root
option is provided
What do you think?
(oh and sorry for posting this in the wrong repo, I had both open and was distracted!)
I've created a similar issue for another project: https://github.com/wjdp/htmltest/issues/184
Basically, if you're generating for a specific subdirectory, it doesn't work as expected, for example:
# imagine a generator here which creates all URLs like <a href="/something">
$ ./my-generator --base=https://example.com --out=./public
$ lychee --offline --base ./public ./public
🔍 1464 Total ✅ 1269 OK 🚫 0 Errors 💤 195 Excluded
works as expected.
# imagine a generator here which creates all URLs like <a href="/some/path/something">
$ ./my-generator --base=https://example.com/some/path --out=./public
$ lychee --offline --base ./public ./public
🔍 1464 Total ✅ 16 OK 🚫 1253 Errors (HTTP:1253) 💤 195 Excluded
It's trying to find files at ./public/some/path
which doesn't exist, since the content of the ./public
folder will be deployed to that location as is.
If I hack it with:
$ mkdir public/some
$ ln -s ../../public public/some/path
$ ll -d public/some/path
lrwxrwxrwx 1 dalibor.karlovic dalibor.karlovic 12 sij 19 09:54 public/some/path -> ../../public
now it works since paths will be evaluated like ./public/some/path/images/logo.svg
via the symlink.
It would be great if either base or root could be set when running in local mode to say
The content of this folder will be deployed to
https://example.com/some/path
, resolve links you find in it like this (treat them like offline links):
https://example.com/some/path/images/logo.svg => ./public/images/logo.svg
/some/path/images/logo.svg => ./public/images/logo.svg
Just revisited this issue and your suggestions make sense. We can use the examples for unit tests. I agree that there needs to be a separation between --base
and --root
as they don't serve the same purpose. --offline
can probably deprecated once we have --root
.
@mre don't know if this is a new issue, so I'll shortly discuss it here since it's (IMO) related to the feature described here, which doesn't (yet) exists:
if you're deploying to most static site hosting platforms, they support the index file, where the URL https://example.com/foo/bar will route to ./public/foo/bar/index.html
. If you're validating URLs in files generated with this new --root
param, the validation process should allow for this, namely
https://example.com/foo/bar => ./public/foo/bar/index.html
https://example.com/foo/bar/index.html => ./public/foo/bar/index.html
/foo/bar/index.html => ./public/foo/bar/index.html
/foo/bar => ./public/foo/bar/index.html
Your examples are for lychee --root ./public
, right?
BASE_URL=https://example.com/some/path ./my-generator --out=./public
You mentioned two examples in the other issue that we can use as tests:
https://example.com/some/path/images/logo.svg => ./public/images/logo.svg
/some/path/images/logo.svg => ./public/images/logo.svg
I think all of this can be done with the current proposal (?). So if we have support for both --base
and --root
, we should be able to express all cases -- unless I'm missing something here. So, no new issue needed I think. Correct me if I'm wrong.
@mre OK, I'll create another issue to discuss the missing index file resolving, thanks!
Quick note to self:
we should call it --base-url
and --root-path
to be more explicit about the purpose of these options.
@mre any chance to get these --base-url
and --root-path
options prioritised?
It's currently difficult to get lychee working properly on a GitHub pages repository as GitHub pages publishes to a subdirectory unless you configure a custom domain.
@vanbroup I just now figured how to solve it for me.
In my case, we have a template site that when deploying to GitHub pages create absolute links internally. The template have a parameter in _config.yml
that is baseurl
, which is used when creating the urls. In the template demo it is set to baseurl: /al-folio
. After building the jekyll template the directory _site/
is created, which is the one that will be deployed to GitHub pages.
In it we have, for example, a page with path _site/projects/4_project/index.html
that internally links to images with path /al-folio/assets/img/11.jpg
. This absolute path /al-folio/
actually is the _site/
path in the final version, meaning it will point to _site/assets/img/11.jpg
.
I am running lychee
on the _site/
directory before sending it to GitHub pages, but when testing for this image link, for example, by default it tests it as file:///home/runner/work_site/al-folio/_site/projects/4_project/al-folio/assets/img/11.jpg
instead of as file:///home/runner/work_site/al-folio/_site/assets/img/11.jpg
. To force this behavior I give lychee the following parameters:
--offline --remap '_site(/?.*)/assets/(.*) _site/assets/$2' --verbose --no-progress '_site/**/*.html'
The remap
argument helps me to use regex to fix the urls. You can check the GitHub action.
Great workaround.
In fact, I think --remap
would work for most (all?) the cases that --base-url
and --root-path
would be used for.
@vanbroup, can you try to use --remap
for your case?
Of course, I still think we should have --base-url
and --root-path
at some point, because it is a very common operation.
It's nice the new remap feature is so flexible, but I agree it would be beneficial to provide a simpler UX for this. :+1: