Partial link exclusion: check status code but skip fragments for certain sites
In some instances, it may be helpful to verify a site returns an expected status code, but skip fragment-checking. This is useful for sites which parse the fragment through JS.
One example is GitHub's line-number https://github.com/lycheeverse/lychee/blob/73dff8f56fae16b5ac47e3e2eac1bbb15cd52332/.gitignore#L1. While it may be desirable to verify the fragment exists by checking whether the line number is available, such behaviour will be dependent on the site and will need to be tediously implemented in lychee on a case-by-case basis. Instead, a workaround could be for the user to specify domains/URLs to skip fragment checking. Hence, this feature request.
Perhaps a special exclusion syntax could be added?
Interesting. If we end up implementing it, we need a clean syntax.
The workaround is to run lychee twice, one for inputs where fragments should be included and one for inputs where they shouldn't.
Faced the same need but for GitHub comments or README headings (.../README.md#heading or .../issues/1789#issuecomment-xxx)
I haven't tested this, but another workaround might be to remap away the fragments in the links where you don't want to check them. This would look something like
--remap '(https://github\.com/lycheeverse/lychee/blob/[^#]+)#L\d+$ $1
If you're already using remaps, be aware that at most one remap can apply per URL.
@katrinafyi Thanks a lot for the neat workaround! I used
remap = [
'(?P<host>^https://github\.com)/(?P<path>.*)#(?P<anchor>.*)$ $host/$path/',
]
for my purposes.
Remaps are really powerful and a bit weird. 😆