JobSpy icon indicating copy to clipboard operation
JobSpy copied to clipboard

Enhancement - Scrap data when status code 429, instead of exiting scraping - Linkedin

Open muzaT opened this issue 3 months ago • 2 comments

Hi @cullenwatson !

I was wondering, if we put a simple check on status code 429. If status code is 429, it should keep retrying/attempting the website until it receives status code 200 because if status code 429 appears we can refresh browser and resume. It will most help with proxy, which provide auto rotating. This is suggestion is specifically for Linkedin.

Something like this for without auto-rotating proxies:

if page.status_code == 429:
        print("Error fetching page, Error: " + str(page.status_code))
        while True:
            page = requests.get(url, headers=headers)
            if page.status_code == 200:
                break
            else:
                print("Retrying website!") 

If proxy provider is providing auto-rotating, we can pass another new params as "auto-rotating_proxy = True" and use/execute code something like this:

if page.status_code != 200:
        print("Error fetching page, Error: " + str(page.status_code))
        while True:
            page = requests.get(url, headers=headers)
            if page.status_code == 200:
                break
            else:
                print("Retrying website!") 

This will change the proxy automatically on each re-attempt, whenever there is an error. I hope this helps the community and users.

muzaT avatar Mar 12 '24 07:03 muzaT

I think the write way to handle this here is to hand the user the session, so they can handle the responses however they want using requests hooks? Agree?

ZacharyHampton avatar Mar 12 '24 10:03 ZacharyHampton

@ZacharyHampton Yes that will work out well but a newbie might find it bit difficult to handle. What we can do is provide both functionalities. We pass parameter something like (session_response), if true it will automatically handle similar to the solution I have proposed and if it is false, it lets user handle the session. How does that sounds?

muzaT avatar Mar 12 '24 10:03 muzaT