lychee
lychee copied to clipboard
lychee strips final dot from link and this leads to 404
log:
❯ echo "https://archive.org/details/23the.amazing.spiderman.the.deadly.dust." | lychee -
⠈ 1/1 [00:00:13] █████████████████████████ ✗ [404] https://archive.org/details/23the.amazing.spiderman.the.deadly.dust |Issues found in 1 input. Find details below.
[stdin]:
✗ [404] https://archive.org/details/23the.amazing.spiderman.the.deadly.dust | Failed: Network error: Not Found
🔍 1 Total ✅ 0 OK 🚫 1 Error (HTTP:1)
affected links for test:
https://ru.wikipedia.org/wiki/%D0%9F%D0%BE%D1%81%D0%BB%D0%B5_%D0%B4%D0%BE%D0%B6%D0%B4%D0%B8%D1%87%D0%BA%D0%B0,_%D0%B2_%D1%87%D0%B5%D1%82%D0%B2%D0%B5%D1%80%D0%B3...
https://archive.org/details/23the.amazing.spiderman.the.deadly.dust.
likely a bug caused by linkify
Yes, @soredake can you report that over at https://github.com/robinst/linkify?
Reported https://github.com/robinst/linkify/issues/57
See https://github.com/robinst/linkify/issues/57#issuecomment-1605280936, this is something that will need to be handled in lychee itself.
@robinst, from your original comment:
try both variants and if one is fine, pass the check
Sounds easy in theory, but in practice we'd end up parsing the input twice; once for the initial check using the extracted link from linkify and once when we look up the suffix as a fallback. We could add another lookahead
field to the raw links to avoid the second parsing step, but a single lookahead character won't be sufficient, as can be seen in the original wikipedia.ru link, which contains three dots at the end. The lookahead could potentially be unbounded.
I don't have a good idea on how to solve this right now. Open for suggestions.