py-linkedin-jobs-scraper icon indicating copy to clipboard operation
py-linkedin-jobs-scraper copied to clipboard

Connection pool error

Open ErpmeDerp opened this issue 2 years ago • 6 comments

Hi all, first of all, hats off for this piece of code. Very useful.

I am getting the following error while running the authenticated version (anonymous version seems to run fine for the results viewable).

INFO:li:scraper:('[data engineer][european union]', 'Opening https://www.linkedin.com/jobs/search?keywords=data+engineer&location=european+union&sortBy=DD&f_TPR=r2592000&f_JT=F&f_E=1&start=0')
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: 127.0.0.1. Connection pool size: 1
WARNING:li:scraper:('[data engineer][european union]', 'Error in response', 'https://www.linkedin.com/jobs/search/?currentJobId=3159886155&f_E=1&f_JT=F&f_TPR=r2592000&keywords=data%20engineer&location=european%20union&sortBy=DD', 'request_id=63028.200 status=404 type=XHR mime_type=application/vnd.linkedin.normalized+json+2.1 url=https://www.linkedin.com/voyager/api/voyagerMessagingDashAwayStatus')
WARNING:li:scraper:('[data engineer][european union]', 'No jobs found, skip')
[ON_END]

Does anyone know what the issue could be here? This is the code I am using:

import logging
import csv
from sys import maxsize
from linkedin_jobs_scraper import LinkedinScraper
from linkedin_jobs_scraper.events import Events, EventData, EventMetrics
from linkedin_jobs_scraper.query import Query, QueryOptions, QueryFilters
from linkedin_jobs_scraper.filters import RelevanceFilters, TimeFilters, TypeFilters, ExperienceLevelFilters, RemoteFilters

# Change root logger level (default is WARN)
logging.basicConfig(level = logging.INFO)

job_data = []

# Fired once for each successfully processed job
def on_data(data: EventData):
    job_data.append([data.title, data.company, data.place, data.date, data.link])

# Fired once for each page (25 jobs)
def on_metrics(metrics: EventMetrics):
  print('[ON_METRICS]', str(metrics))

def on_error(error):
    print('[ON_ERROR]', error)

def on_end():
    print('[ON_END]')

scraper = LinkedinScraper(
    chrome_options=None,  # You can pass your custom Chrome options here
    headless=False,
    max_workers=1,  # How many threads will be spawn to run queries concurrently (one Chrome driver for each thread)
    slow_mo=2,  # Slow down the scraper to avoid 'Too many requests (429)' errors
    page_load_timeout=25 # Page load timetout (in seconds)
)

# Add event listenerspy e
scraper.on(Events.DATA, on_data)
scraper.on(Events.ERROR, on_error)
scraper.on(Events.END, on_end)

queries = [
    Query(
        query='data engineer',
        options=QueryOptions(
            locations=['european union'],
            apply_link=False, 
            optimize=False,
            limit=100,
            filters=QueryFilters(
                relevance=RelevanceFilters.RECENT,
                time=TimeFilters.MONTH,
                type=[TypeFilters.FULL_TIME],
                experience=[ExperienceLevelFilters.INTERNSHIP],
            )
        )
    ),
]

scraper.run(queries)

fields = ['Job', 'Company', 'Place', 'Date', 'Link']
rows = []

for x in job_data:
    i = 0
    rows.append([x[0], x[1], x[2], x[3], x[4]])
    i = i + 1

with open('jobs_data.csv', 'w') as f:
    write = csv.writer(f)

    write.writerow(fields)
    write.writerows(rows)

ErpmeDerp avatar Aug 09 '22 11:08 ErpmeDerp

Same error here with authenticated session. Running it with headless=False I can see the jobs on the page, but for some reason I keep getting no job found error. It was running properly until yesterday evening (08/08/2022)

Joko75 avatar Aug 09 '22 18:08 Joko75

Same error here. Have you guys tried to use another account's cookie? I did. Got correct job results with exactly the same URL.

Think LinkedIn just applied some anti-scrapper code to block certain sessions, including yours and mine. But, they just block the sessions from selenium, not from the actual Chrome browser. (Because the same blocked account can see jobs from visiting the URL in the browser.)

So I think if anyone can fix this by "simulating" the exact same browser behavior from selenium?

VincentChanLivAway avatar Aug 11 '22 09:08 VincentChanLivAway

Same for me. As OP said - thank you for the code. Gamechanger for my job search. Anonymous mode works somewhat - but once I launch the authenticated sessions, it successfully logs in, but once the results are visible, the bot times out.

leonpawelzik avatar Aug 12 '22 09:08 leonpawelzik

It seems Linkedin has changed css class for one of their html elements. Try latest version and see if this fixes the No jobs found, skip issue.

spinlud avatar Aug 12 '22 16:08 spinlud

Seems to work fine now, thanks @spinlud !

Joko75 avatar Aug 12 '22 18:08 Joko75

In my case the session cookie is invalid??? I don't understand:

image

I select li_at....

PARODBE avatar Sep 27 '22 13:09 PARODBE