urlchecker-action
urlchecker-action copied to clipboard
Worry the checker is actually downloading the content at the checked URL
For some reason, this goes pretty slowly. I am working from this document and it takes quite a while to complete a check. Next, I notice that on .pdf
files, it stalls for longer, especially the one at the ftp
link.
So, this has me worried that it is actually fully getting the content to check the link. I've seen similar in the Sphinx URL checking feature too. It should really just be get a header at each URL and not the full content.
Is this something you've looked into?
We should do head instead of get, I agree. We haven't looked into it but can.
Would you like me to update the branch we are working on to try it out?
ChatGPT suggests something along the lines of
import requests
def is_url_working(url):
try:
response = requests.head(url)
# You might also want to check for redirects (response.status_code == 302)
if response.status_code == 200:
return True
elif response.status_code == 405: # head requests disallowed
retval = False
response = requests.get(url, stream=True)
if response.status_code == 200:
retval = True
response.close() # Make sure to close the response
return retval
except requests.RequestException as e:
print(f"An error occurred: {e}")
return False
# Example usage
url = "http://example.com"
if is_url_working(url):
print(f"The URL {url} is working.")
else:
print(f"The URL {url} is not working.")
Oh geez chatGPT? 🙃 The first lines I see issues:
- follow redirects is a parameter on head. https://requests.readthedocs.io/en/latest/user/quickstart/#redirection-and-history
The retval and response.close don’t make sense.
I appreciate the suggestion but I don’t think the quality of code from AI tools is very good. It’s mostly copy pasting some poor souls code from somewhere else on GitHub. I’m happy to write this with my own knowledge and careful inspection of core docs and library code to get the functionality I want.
But I have to take it back - it does look like response.close() is useful for requests.get() ! Geez, I've been writing in Python a long time and I just don't see it very often. So I learned something from ChatGPT! I appreciate the post, and I'll try to be more open minded about it (even if I don't use it)!
I'll try to be more open minded about it (even if I don't use it)!
So, I just happened to see this Q&A with Linus Torvalds about AI tools in coding...
haha I totally watched that! I'll watch again tonight with new context.
So, the more I think about this, the more I think HEAD
requests are the most likely to be useful for links to non .html
content (e.g. .pdf
or other binary content a link references. I think for most .html
content, the cost to download is likely minimal.
Agree! Let me work hard today (just presented at FOSDEM) and maybe I can do some work on this later if I'm productive!