lychee lychee strips final dot from link and this leads to 404

lychee strips final dot from link and this leads to 404

Open soredake opened this issue 2 years ago • 6 comments

log:

❯ echo "https://archive.org/details/23the.amazing.spiderman.the.deadly.dust." | lychee -
⠈ 1/1 [00:00:13] █████████████████████████ ✗ [404] https://archive.org/details/23the.amazing.spiderman.the.deadly.dust |Issues found in 1 input. Find details below.

[stdin]:
✗ [404] https://archive.org/details/23the.amazing.spiderman.the.deadly.dust | Failed: Network error: Not Found

🔍 1 Total ✅ 0 OK 🚫 1 Error (HTTP:1)

affected links for test: https://ru.wikipedia.org/wiki/%D0%9F%D0%BE%D1%81%D0%BB%D0%B5_%D0%B4%D0%BE%D0%B6%D0%B4%D0%B8%D1%87%D0%BA%D0%B0,_%D0%B2_%D1%87%D0%B5%D1%82%D0%B2%D0%B5%D1%80%D0%B3... https://archive.org/details/23the.amazing.spiderman.the.deadly.dust.

Feb 02 '23 18:02 soredake

likely a bug caused by linkify

Feb 02 '23 18:02 lebensterben

Yes, @soredake can you report that over at https://github.com/robinst/linkify?

Feb 02 '23 18:02 mre

Yes, @soredake can you report that over at robinst/linkify?

Will do this later.

Feb 02 '23 18:02 soredake

Reported https://github.com/robinst/linkify/issues/57

Feb 23 '23 15:02 soredake

See https://github.com/robinst/linkify/issues/57#issuecomment-1605280936, this is something that will need to be handled in lychee itself.

Jun 24 '23 05:06 robinst

@robinst, from your original comment:

try both variants and if one is fine, pass the check

Sounds easy in theory, but in practice we'd end up parsing the input twice; once for the initial check using the extracted link from linkify and once when we look up the suffix as a fallback. We could add another lookahead field to the raw links to avoid the second parsing step, but a single lookahead character won't be sufficient, as can be seen in the original wikipedia.ru link, which contains three dots at the end. The lookahead could potentially be unbounded.

I don't have a good idea on how to solve this right now. Open for suggestions.

Jun 26 '23 17:06 mre

lychee lychee copied to clipboard

lychee strips final dot from link and this leads to 404

lychee
lychee copied to clipboard