lychee icon indicating copy to clipboard operation
lychee copied to clipboard

Support setting a base for absolute links

Open norswap opened this issue 2 years ago • 12 comments

By default, lychee doesn't seem to check absolute links (/-prefixed), except when setting the --base argument (this is weird by itself, since --base should be for the resolution of absolute link as per documentation).

It would be great to:

  • check absolute links by default, using the directory that lychee is run from as the root
  • have a --root argument that works like --base but for absolute links

norswap avatar Jan 10 '22 16:01 norswap

Hi, we decided to not check absolute links by default, because it could generate a lot of false-positives when assuming the current directory as a root. For instance, many static site generators put files into a subfolder like public or dist and in this case, assuming using the directory that lychee is run from as the root would not work.

--base can be a base url (e.g. https://example.com) or a directory path (acting like the --root you proposed). Setting both would be confusing. When to use the url and when the path when encountering an absolute link?

mre avatar Jan 10 '22 17:01 mre

Hey Mathias! Thanks for the prompt answer!

There might a bit of a confusion, my suggestion is not about path vs URLs. Here's what I propose:

  • both --base and --root can be set to either a path or an url
  • when encoutering a relative path, prepend the value of --base
  • when encountering an absolute path (starting with /, but not http://, file://, etc), prepend the value of --root
  • only check absolute paths if the --root option is provided

What do you think?

(oh and sorry for posting this in the wrong repo, I had both open and was distracted!)

norswap avatar Jan 11 '22 00:01 norswap

I've created a similar issue for another project: https://github.com/wjdp/htmltest/issues/184

Basically, if you're generating for a specific subdirectory, it doesn't work as expected, for example:

# imagine a generator here which creates all URLs like <a href="/something">
$ ./my-generator --base=https://example.com --out=./public
$ lychee --offline --base ./public ./public
🔍 1464 Total ✅ 1269 OK 🚫 0 Errors 💤 195 Excluded

works as expected.

# imagine a generator here which creates all URLs like <a href="/some/path/something">
$ ./my-generator --base=https://example.com/some/path --out=./public
$ lychee --offline --base ./public ./public
🔍 1464 Total ✅ 16 OK 🚫 1253 Errors (HTTP:1253) 💤 195 Excluded

It's trying to find files at ./public/some/path which doesn't exist, since the content of the ./public folder will be deployed to that location as is.

If I hack it with:

$ mkdir public/some
$ ln -s ../../public public/some/path
$ ll -d public/some/path
lrwxrwxrwx 1 dalibor.karlovic dalibor.karlovic 12 sij  19 09:54 public/some/path -> ../../public

now it works since paths will be evaluated like ./public/some/path/images/logo.svg via the symlink.

It would be great if either base or root could be set when running in local mode to say

The content of this folder will be deployed to https://example.com/some/path, resolve links you find in it like this (treat them like offline links):

https://example.com/some/path/images/logo.svg => ./public/images/logo.svg
/some/path/images/logo.svg => ./public/images/logo.svg

dkarlovi avatar Jan 19 '22 08:01 dkarlovi

Just revisited this issue and your suggestions make sense. We can use the examples for unit tests. I agree that there needs to be a separation between --base and --root as they don't serve the same purpose. --offline can probably deprecated once we have --root.

mre avatar Jan 31 '22 22:01 mre

@mre don't know if this is a new issue, so I'll shortly discuss it here since it's (IMO) related to the feature described here, which doesn't (yet) exists:

if you're deploying to most static site hosting platforms, they support the index file, where the URL https://example.com/foo/bar will route to ./public/foo/bar/index.html. If you're validating URLs in files generated with this new --root param, the validation process should allow for this, namely

https://example.com/foo/bar  => ./public/foo/bar/index.html
https://example.com/foo/bar/index.html  => ./public/foo/bar/index.html
/foo/bar/index.html  => ./public/foo/bar/index.html
/foo/bar => ./public/foo/bar/index.html

dkarlovi avatar Feb 25 '22 16:02 dkarlovi

Your examples are for lychee --root ./public, right?

BASE_URL=https://example.com/some/path ./my-generator --out=./public

You mentioned two examples in the other issue that we can use as tests:

https://example.com/some/path/images/logo.svg => ./public/images/logo.svg
/some/path/images/logo.svg => ./public/images/logo.svg

I think all of this can be done with the current proposal (?). So if we have support for both --base and --root, we should be able to express all cases -- unless I'm missing something here. So, no new issue needed I think. Correct me if I'm wrong.

mre avatar Mar 03 '22 16:03 mre

@mre OK, I'll create another issue to discuss the missing index file resolving, thanks!

dkarlovi avatar Mar 03 '22 17:03 dkarlovi

Quick note to self: we should call it --base-url and --root-path to be more explicit about the purpose of these options.

mre avatar Nov 28 '22 12:11 mre

@mre any chance to get these --base-url and --root-path options prioritised?

It's currently difficult to get lychee working properly on a GitHub pages repository as GitHub pages publishes to a subdirectory unless you configure a custom domain.

vanbroup avatar Jan 15 '24 08:01 vanbroup

@vanbroup I just now figured how to solve it for me.

In my case, we have a template site that when deploying to GitHub pages create absolute links internally. The template have a parameter in _config.yml that is baseurl, which is used when creating the urls. In the template demo it is set to baseurl: /al-folio. After building the jekyll template the directory _site/ is created, which is the one that will be deployed to GitHub pages.

In it we have, for example, a page with path _site/projects/4_project/index.html that internally links to images with path /al-folio/assets/img/11.jpg. This absolute path /al-folio/ actually is the _site/ path in the final version, meaning it will point to _site/assets/img/11.jpg.

I am running lychee on the _site/ directory before sending it to GitHub pages, but when testing for this image link, for example, by default it tests it as file:///home/runner/work_site/al-folio/_site/projects/4_project/al-folio/assets/img/11.jpg instead of as file:///home/runner/work_site/al-folio/_site/assets/img/11.jpg. To force this behavior I give lychee the following parameters:

--offline --remap '_site(/?.*)/assets/(.*) _site/assets/$2' --verbose --no-progress '_site/**/*.html'

The remap argument helps me to use regex to fix the urls. You can check the GitHub action.

george-gca avatar Jan 15 '24 18:01 george-gca

Great workaround. In fact, I think --remap would work for most (all?) the cases that --base-url and --root-path would be used for. @vanbroup, can you try to use --remap for your case? Of course, I still think we should have --base-url and --root-path at some point, because it is a very common operation.

mre avatar Jan 15 '24 23:01 mre

It's nice the new remap feature is so flexible, but I agree it would be beneficial to provide a simpler UX for this. :+1:

dkarlovi avatar Apr 29 '24 11:04 dkarlovi