Reddit links can not be checked? + Default list of exclusions
I found an older discussion on issue with checking Reddit links and posted my observation there too: https://github.com/lycheeverse/lychee/discussions/1324#discussioncomment-13402967
After adding a link to project related subreddit to a README file, it always returns an error on the link:
[403] Network error: Forbidden
The link follows the usual pattern (https:/www.reddit.com/r/SUBREDDITNAME/) and does work just fine when putting it into the browser directly.
Can somebody confirms this or is aware
I had to add --exclude www.reddit.com to the args in lychee-action.
Reading https://github.com/lycheeverse/lychee-action/issues/53 then, I found out that e.g. twitter links are actually always excluded as false positives that would always fail: https://github.com/lycheeverse/lychee/blob/master/lychee-lib/src/filter/mod.rs#L34-L39
Should reddit also be listed there?
I also saw certain w3.org links that are also skipped automatically:
https://github.com/lycheeverse/lychee/blob/master/lychee-lib/src/filter/mod.rs#L42-L52
But looking at e.g. https://www.w3schools.com/xml/schema_intro.asp, they actually list a different link not covered right now, which I circumvent by excluding www.w3.org in general:
http://www.w3.org/2001/XMLSchema-instance
It depends on the network connection. It works over here (at the moment):
echo 'https://www.reddit.com/r/rust/' | lychee -vvv -
[200] https://www.reddit.com/r/rust/
🔍 1 Total (in 0s) ✅ 1 OK 🚫 0 Errors
Maybe they blocked the GitHub IP Address range?
As for w3.org, could you create a PR? I think we should exclude that automatically.
It depends on the network connection. It works over here (at the moment):
echo 'https://www.reddit.com/r/rust/' | lychee -vvv - [200] https://www.reddit.com/r/rust/ 🔍 1 Total (in 0s) ✅ 1 OK 🚫 0 ErrorsMaybe they blocked the GitHub IP Address range?
I did not try with lychee directly as I'm only using the lychee-action GitHub Action. That's also why I opened the ticket here and not at the lychee repo.
Can you confirm my issue when trying the same URL with the action? Maybe they do block GitHub IP's indeed? 🤔
As for w3.org, could you create a PR? I think we should exclude that automatically.
https://github.com/lycheeverse/lychee/pull/1735 😎
Just tested using the lychee-action@master and can confirm:
[403] https://www.reddit.com/r/rust/ | Rejected status code (this depends on your "accept" configuration): Forbidden
# Summary
| Status | Count |
|---------------|-------|
| 🔍 Total | 1 |
| ✅ Successful | 0 |
| ⏳ Timeouts | 0 |
| 🔀 Redirected | 0 |
| 👻 Excluded | 0 |
| ❓ Unknown | 0 |
| 🚫 Errors | 1 |
## Errors per input
### Errors in README.md
* [403] <https://www.reddit.com/r/rust/> | Rejected status code (this depends on your "accept" configuration): Forbidden
Unfortunately, it looks like Reddit blocks GitHub (workflows) now. I also tried setting a different user-agent, but it didn't work.
Since it still works with the lychee binary, I see two options:
- Add Reddit to a "global" list of exclusions, which gets used in
lychee-action. - Do nothing and hope that GitHub will be unblocked.
What do you think?
Any feedback?
Just tested using the
lychee-action@masterand can confirm:[403] https://www.reddit.com/r/rust/ | Rejected status code (this depends on your "accept" configuration): Forbidden # Summary | Status | Count | |---------------|-------| | 🔍 Total | 1 | | ✅ Successful | 0 | | ⏳ Timeouts | 0 | | 🔀 Redirected | 0 | | 👻 Excluded | 0 | | ❓ Unknown | 0 | | 🚫 Errors | 1 | ## Errors per input ### Errors in README.md * [403] <https://www.reddit.com/r/rust/> | Rejected status code (this depends on your "accept" configuration): ForbiddenUnfortunately, it looks like Reddit blocks GitHub (workflows) now. I also tried setting a different user-agent, but it didn't work.
Since it still works with the lychee binary, I see two options:
- Add Reddit to a "global" list of exclusions, which gets used in
lychee-action.- Do nothing and hope that GitHub will be unblocked.
What do you think?
Thanks for answering to my question i had couple of hours ago. I get for most of the websites: Rejected status code (this depends on your "accept" configuration): Forbidden. So, I will need to wait for them to enable GitHub Actions to make requests, correct?
Yes, that is the case. There's nothing we can do about it. Reddit would have to unblock GitHub IP ranges from making requests.
I don't know if or when this ban will be lifted.
I don't think there is much to do in this issue anymore, so I'm closing it. In my opinion, we should not add Reddit to the global exclusion list, because requests still work on the command-line.