py-linkedin-jobs-scraper
py-linkedin-jobs-scraper copied to clipboard
add support for proxy auth - timeout exception
I tried using your library with proxy auth. which creates a extension and adds it to chrome options.
however I get TimeoutException because of this line in linkedin_scrappy.py:
driver = build_driver(
executable_path=self.chrome_executable_path,
options=self.chrome_options,
headless=self.headless,
timeout=120 ### what i need to add to make the proxy work. default of 20 is not enough
)
so it would be nice if you can make LinkedinScraper constructor accept a WebDriver argument (allows using Selenium-Wire library for proxy auth) or accept a timeout argument
additionally, it would also be nice to let user pass argument for timeout per job used by anonymous_strategy.py
def __load_job_details(driver: webdriver, selectors: Selectors, job_id: str, timeout=2) -> object:
1 way to use proxy auth with chrome using extension, which can't work with headless mode:
chrome_options.add_extension(createProxyZip())
def createProxyZip(PROXY_HOST,PROXY_PORT,PROXY_USER,PROXY_PASS):
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
},
"minimum_chrome_version":"22.0.0"
}
"""
background_js = """
var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "http",
host: "%(host)s",
port: parseInt(%(port)d)
},
bypassList: []
}
};
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "%(user)s",
password: "%(pass)s"
}
};
}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: ["<all_urls>"]},
['blocking']
);
""" % {
"host": PROXY_HOST,
"port": PROXY_PORT,
"user": PROXY_USER,
"pass": PROXY_PASS,
}
pluginfile = 'proxy_auth_plugin.zip'
with zipfile.ZipFile(pluginfile, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
return pluginfile
the other way is using selenium-wire which is more preferred https://github.com/wkeeling/selenium-wire