JobSpy icon indicating copy to clipboard operation
JobSpy copied to clipboard

Improve LinkedIn scraper robustness

Open lluissalord opened this issue 1 year ago • 1 comments

Reviewing some posts about scraping LinkedIn Jobs I found that the same data extracted from https://www.linkedin.com/jobs/view/<JOB_ID> is coming from https://www.linkedin.com/jobs-guest/jobs/api/jobPosting/<JOB_ID>.

Then, it makes me think that maybe using this second URL could make the scraper more robust to not get blocked by LinkedIn. I haven't tried, but it could be interesting to make a stress test to see if it has better behavior.

lluissalord avatar May 02 '24 07:05 lluissalord

Should be able test just call each endpoint x times and see which gets 429 first. However, we get 429 quickly with jobs search page and it's same api format as that endpoint

cullenwatson avatar May 04 '24 07:05 cullenwatson