webpreview
webpreview copied to clipboard
duplicate requests sometimes not necessary
Hi,
Thanks for your work as it is very useful. Why do you make a second request if the first one works?
try:
res = requests.get(url, timeout=timeout, headers=headers)
except (ConnectionError, HTTPError, Timeout, TooManyRedirects):
raise URLUnreachable("The URL does not exist.")
except MissingSchema: # if no schema add http as default
url = "http://" + url
# throw URLUnreachable exception for just incase
try:
res = requests.get(url, timeout=timeout, headers=headers)
except (ConnectionError, HTTPError, Timeout, TooManyRedirects):
raise URLUnreachable("The URL is unreachable.")
Also, you can reduce the rate of failure of the first block if you check for schema before any request is made (with a regex). Which would therefore allow you to merge the 2 blocks in one...
I'm a little late to the party, coming 4 years after the original question, but that's a good point. The first request can also be avoided when the content
of the page is supplied. And we can check for the missing schema using a regex or urlparse
method, without making a request.
If you are looking for the package with duplicated request avoided and a slightly improved parsing added on top, then I could recommend looking into a fork I made at web2preview. It is fully compatible with this package as well, so the same API can be used.