notfoundbot
                                
                                 notfoundbot copied to clipboard
                                
                                    notfoundbot copied to clipboard
                            
                            
                            
                        False results for domains
- https://www.930.com/
- https://jade-lang.com/
These got updated to archive URLs, but they're online. Figure out.
Okay, so observations so far:
The 930 club is using a "Sucuri Cloud Proxy" that seems to identify notfoundbot's requests as a DDoS. I've tried the basics to figure out what is informing that silly proxy that it's a bot, but haven't found anything clear so far: curl works, even if I disable all of curl's default headers.
For https://jade-lang.com/ the issue is the SSL certificate, which works for Firefox and Chrome, but not for node. Options I see so far are either disabling strict SSL checks entirely, or loading up a wider set of SSL root certs using something like https://github.com/arvind-agarwal/node_extra_ca_certs_mozilla_bundle
I've got a few more that appear dead but aren't. https://github.com/agrc/gis.utah.gov/pull/1630/files
- http://www.exploreutah.com/GettingAround/Navigating_Utahs_Streets.shtml
- https://ugic.org/
curl works for both of these urls ¯_(ツ)_/¯
Adding some more to the list:
https://dc.gov/ something is very weird with the SSL configuration on this one - the first time I curl it, I get:
➜  ~ curl https://dc.gov/
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to dc.gov:443
Thoughts on creating an exceptions list for the repeat offender links that aren't rotten?
Yep, exactly, I think that's a great idea.
I keep getting this false positive: https://github.com/cmudig/cmudig.github.io/pull/50. Maybe it's related.
Yeah, trying that with curl:
$ curl https://athletics.cmu.edu/athletics/mascot/index
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>403 ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
<BR clear="all">
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: cBXZnut_EYl-4AVwmNzjF7Qkx9nmy3Z_bXdkIVDiwgxRsTAE_r1YxQ==
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>
Cloudfront must be able to block requests with UA strings like curl/version? Any UA string that has the words curl in it fail  but the following works.
curl -A "do not mention the c word" https://athletics.cmu.edu/athletics/mascot/index
Using the UA string that you use in ~linkrot~ notfoundbot works just fine.