webpreview duplicate requests sometimes not necessary

duplicate requests sometimes not necessary

Open qwilbird opened this issue 5 years ago • 1 comments

Hi,

Thanks for your work as it is very useful. Why do you make a second request if the first one works?

try:
            res = requests.get(url, timeout=timeout, headers=headers)
        except (ConnectionError, HTTPError, Timeout, TooManyRedirects):
            raise URLUnreachable("The URL does not exist.")
        except MissingSchema: # if no schema add http as default
            url = "http://" + url

        # throw URLUnreachable exception for just incase
        try:
            res = requests.get(url, timeout=timeout, headers=headers)
        except (ConnectionError, HTTPError, Timeout, TooManyRedirects):
            raise URLUnreachable("The URL is unreachable.")

Also, you can reduce the rate of failure of the first block if you check for schema before any request is made (with a regex). Which would therefore allow you to merge the 2 blocks in one...

Jul 15 '18 19:07 qwilbird

I'm a little late to the party, coming 4 years after the original question, but that's a good point. The first request can also be avoided when the content of the page is supplied. And we can check for the missing schema using a regex or urlparse method, without making a request.

If you are looking for the package with duplicated request avoided and a slightly improved parsing added on top, then I could recommend looking into a fork I made at web2preview. It is fully compatible with this package as well, so the same API can be used.

May 24 '22 18:05 vduseev

webpreview webpreview copied to clipboard

duplicate requests sometimes not necessary

webpreview
webpreview copied to clipboard