linkedin_scraper
linkedin_scraper copied to clipboard
Profile scraping errors
Profile scraping broke yesterday (for me). It was working just fine yesterday morning and then broke in the afternoon with no code changes on my part. Very odd. Is this typical of LinkedIn changing (and breaking) their pages frequently? I've tried headless and non-headless options with same result. Appreciate anyone who can help fix this soon. Thanks.
Code 1:
from linkedin_scraper import Person, Company, actions
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pprint
pp = pprint.PrettyPrinter(indent=4)
options = Options()
options.page_load_strategy = 'normal'
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)
email = "[email protected]"
password = "Injh1j3gNAlNp#Tr"
actions.login(driver, email, password) # if email and password isn't given, it'll prompt in terminal
profile_url = "https://www.linkedin.com/in/connorbelicic"
person = Person(profile_url, driver=driver)
pp.pprint(person)
Log 1:
File "/Users/ted/sotalented/scraping/linkedin_scraper/main.py", line 19, in <module>
person = Person(profile_url, driver=driver)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ted/sotalented/scraping/linkedin_scraper/venv/lib/python3.11/site-packages/linkedin_scraper/person.py", line 64, in __init__
self.scrape(close_on_complete)
File "/Users/ted/sotalented/scraping/linkedin_scraper/venv/lib/python3.11/site-packages/linkedin_scraper/person.py", line 89, in scrape
self.scrape_logged_in(close_on_complete=close_on_complete)
File "/Users/ted/sotalented/scraping/linkedin_scraper/venv/lib/python3.11/site-packages/linkedin_scraper/person.py", line 285, in scrape_logged_in
self.get_experiences()
File "/Users/ted/sotalented/scraping/linkedin_scraper/venv/lib/python3.11/site-packages/linkedin_scraper/person.py", line 147, in get_experiences
times = work_times.split("·")[0].strip() if work_times else ""
^^^^^^^^^^
UnboundLocalError: cannot access local variable 'work_times' where it is not associated with a value
Process finished with exit code 1
Code 2:
# Just changed the profile to yield different error result...
profile_url = "https://www.linkedin.com/in/tedcohn"
Log 2:
File "/Users/ted/sotalented/scraping/linkedin_scraper/main.py", line 18, in <module>
person = Person(profile_url, driver=driver)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ted/sotalented/scraping/linkedin_scraper/venv/lib/python3.11/site-packages/linkedin_scraper/person.py", line 64, in __init__
self.scrape(close_on_complete)
File "/Users/ted/sotalented/scraping/linkedin_scraper/venv/lib/python3.11/site-packages/linkedin_scraper/person.py", line 89, in scrape
self.scrape_logged_in(close_on_complete=close_on_complete)
File "/Users/ted/sotalented/scraping/linkedin_scraper/venv/lib/python3.11/site-packages/linkedin_scraper/person.py", line 285, in scrape_logged_in
self.get_experiences()
File "/Users/ted/sotalented/scraping/linkedin_scraper/venv/lib/python3.11/site-packages/linkedin_scraper/person.py", line 131, in get_experiences
position_title = outer_positions[0].find_element(By.TAG_NAME,"span").find_element(By.TAG_NAME,"span").text
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ted/sotalented/scraping/linkedin_scraper/venv/lib/python3.11/site-packages/selenium/webdriver/remote/webelement.py", line 417, in find_element
return self._execute(Command.FIND_CHILD_ELEMENT, {"using": by, "value": value})["value"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ted/sotalented/scraping/linkedin_scraper/venv/lib/python3.11/site-packages/selenium/webdriver/remote/webelement.py", line 395, in _execute
return self._parent.execute(command, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ted/sotalented/scraping/linkedin_scraper/venv/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py", line 346, in execute
self.error_handler.check_response(response)
File "/Users/ted/sotalented/scraping/linkedin_scraper/venv/lib/python3.11/site-packages/selenium/webdriver/remote/errorhandler.py", line 245, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"tag name","selector":"span"}
(Session info: chrome=114.0.5735.106); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
0 chromedriver 0x000000010fdc86b8 chromedriver + 4937400
1 chromedriver 0x000000010fdbfb73 chromedriver + 4901747
2 chromedriver 0x000000010f97d616 chromedriver + 435734
3 chromedriver 0x000000010f9c0e0f chromedriver + 712207
4 chromedriver 0x000000010f9c10a1 chromedriver + 712865
5 chromedriver 0x000000010f9b4ae6 chromedriver + 662246
6 chromedriver 0x000000010f9e503d chromedriver + 860221
7 chromedriver 0x000000010f9b49c1 chromedriver + 661953
8 chromedriver 0x000000010f9e51ce chromedriver + 860622
9 chromedriver 0x000000010f9ffe76 chromedriver + 970358
10 chromedriver 0x000000010f9e4de3 chromedriver + 859619
11 chromedriver 0x000000010f9b2d7f chromedriver + 654719
12 chromedriver 0x000000010f9b40de chromedriver + 659678
13 chromedriver 0x000000010fd842ad chromedriver + 4657837
14 chromedriver 0x000000010fd89130 chromedriver + 4677936
15 chromedriver 0x000000010fd8fdef chromedriver + 4705775
16 chromedriver 0x000000010fd8a05a chromedriver + 4681818
17 chromedriver 0x000000010fd5c92c chromedriver + 4495660
18 chromedriver 0x000000010fda7838 chromedriver + 4802616
19 chromedriver 0x000000010fda79b7 chromedriver + 4802999
20 chromedriver 0x000000010fdb899f chromedriver + 4872607
21 libsystem_pthread.dylib 0x00007ff81813f1d3 _pthread_start + 125
22 libsystem_pthread.dylib 0x00007ff81813abd3 thread_start + 15
I am also facing the log2 problem. It used to work fine while I was testing it but then it suddenly changed and now I have this error. Happening while scraping person.
Traceback (most recent call last):
File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\lists_check.py", line 21, in
Same issue too. Must be a Linkedin update...
EDIT: the issue is the double span element selectors in the line with the error (and the other similar lines). If you do a single span element selection, it works.
Same issue too. I guess LinkedIn updated its webpage layout or something. I got
An error occured, reason: Message: no such element: Unable to locate element: {"method":"tag name","selector":"span"}
(Session info: chrome=114.0.5735.106)
Stacktrace:
0 chromedriver 0x000000010c6306b8 chromedriver + 4937400
1 chromedriver 0x000000010c627b73 chromedriver + 4901747
2 chromedriver 0x000000010c1e5616 chromedriver + 435734
3 chromedriver 0x000000010c228e0f chromedriver + 712207
4 chromedriver 0x000000010c2290a1 chromedriver + 712865
5 chromedriver 0x000000010c21cae6 chromedriver + 662246
6 chromedriver 0x000000010c24d03d chromedriver + 860221
7 chromedriver 0x000000010c21c9c1 chromedriver + 661953
8 chromedriver 0x000000010c24d1ce chromedriver + 860622
9 chromedriver 0x000000010c267e76 chromedriver + 970358
10 chromedriver 0x000000010c24cde3 chromedriver + 859619
11 chromedriver 0x000000010c21ad7f chromedriver + 654719
12 chromedriver 0x000000010c21c0de chromedriver + 659678
13 chromedriver 0x000000010c5ec2ad chromedriver + 4657837
14 chromedriver 0x000000010c5f1130 chromedriver + 4677936
15 chromedriver 0x000000010c5f7def chromedriver + 4705775
16 chromedriver 0x000000010c5f205a chromedriver + 4681818
17 chromedriver 0x000000010c5c492c chromedriver + 4495660
18 chromedriver 0x000000010c60f838 chromedriver + 4802616
19 chromedriver 0x000000010c60f9b7 chromedriver + 4802999
20 chromedriver 0x000000010c62099f chromedriver + 4872607
21 libsystem_pthread.dylib 0x00007fff205a48fc _pthread_start + 224
22 libsystem_pthread.dylib 0x00007fff205a0443 thread_start + 15
Same issue too. Must be a Linkedin update...
EDIT: the issue is the double span element selectors in the line with the error (and the other similar lines). If you do a single span element selection, it works.
Hi there, have you already solved the issue? If so, could you please include some of your codes on how to solve it? Thank you so much.
I am also facing the log2 problem. It used to work fine while I was testing it but then it suddenly changed and now I have this error. Happening while scraping person. Traceback (most recent call last): File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\lists_check.py", line 21, in person.scrape(close_on_complete=False) File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\linkedin_scraper\person.py", line 89, in scrape self.scrape_logged_in(close_on_complete=close_on_complete) File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\linkedin_scraper\person.py", line 285, in scrape_logged_in self.get_experiences() File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\linkedin_scraper\person.py", line 131, in get_experiences position_title = outer_positions[0].find_element(By.TAG_NAME,"span").find_element(By.TAG_NAME,"span").text File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\webelement.py", line 417, in find_element return self._execute(Command.FIND_CHILD_ELEMENT, {"using": by, "value": value})["value"] File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\webelement.py", line 395, in _execute return self._parent.execute(command, params) File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 346, in execute self.error_handler.check_response(response) File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 245, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"tag name","selector":"span"} (Session info: chrome=114.0.5735.110); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception Stacktrace: Backtrace: GetHandleVerifier [0x00B8A813+48355] (No symbol) [0x00B1C4B1] (No symbol) [0x00A25358] (No symbol) [0x00A509A5] (No symbol) [0x00A50B3B] (No symbol) [0x00A49AE1] (No symbol) [0x00A6A784] (No symbol) [0x00A49A36] (No symbol) [0x00A6AA94] (No symbol) [0x00A7C922] (No symbol) [0x00A6A536] (No symbol) [0x00A482DC] (No symbol) [0x00A493DD] GetHandleVerifier [0x00DEAABD+2539405] GetHandleVerifier [0x00E2A78F+2800735] GetHandleVerifier [0x00E2456C+2775612] GetHandleVerifier [0x00C151E0+616112] (No symbol) [0x00B25F8C] (No symbol) [0x00B22328] (No symbol) [0x00B2240B] (No symbol) [0x00B14FF7] BaseThreadInitThunk [0x762B0099+25] RtlGetAppContainerNamedObjectPath [0x77A97B6E+286] RtlGetAppContainerNamedObjectPath [0x77A97B3E+238]
The linkedin html structure for job title has changed from 'span-span' to 'div-span'.
So find the "span" element once is enough.
position_title = outer_positions[0].find_element(By.TAG_NAME,"span").text
I am also facing the log2 problem. It used to work fine while I was testing it but then it suddenly changed and now I have this error. Happening while scraping person. Traceback (most recent call last): File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\lists_check.py", line 21, in person.scrape(close_on_complete=False) File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\linkedin_scraper\person.py", line 89, in scrape self.scrape_logged_in(close_on_complete=close_on_complete) File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\linkedin_scraper\person.py", line 285, in scrape_logged_in self.get_experiences() File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\linkedin_scraper\person.py", line 131, in get_experiences position_title = outer_positions[0].find_element(By.TAG_NAME,"span").find_element(By.TAG_NAME,"span").text File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\webelement.py", line 417, in find_element return self._execute(Command.FIND_CHILD_ELEMENT, {"using": by, "value": value})["value"] File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\webelement.py", line 395, in _execute return self._parent.execute(command, params) File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 346, in execute self.error_handler.check_response(response) File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 245, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"tag name","selector":"span"} (Session info: chrome=114.0.5735.110); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception Stacktrace: Backtrace: GetHandleVerifier [0x00B8A813+48355] (No symbol) [0x00B1C4B1] (No symbol) [0x00A25358] (No symbol) [0x00A509A5] (No symbol) [0x00A50B3B] (No symbol) [0x00A49AE1] (No symbol) [0x00A6A784] (No symbol) [0x00A49A36] (No symbol) [0x00A6AA94] (No symbol) [0x00A7C922] (No symbol) [0x00A6A536] (No symbol) [0x00A482DC] (No symbol) [0x00A493DD] GetHandleVerifier [0x00DEAABD+2539405] GetHandleVerifier [0x00E2A78F+2800735] GetHandleVerifier [0x00E2456C+2775612] GetHandleVerifier [0x00C151E0+616112] (No symbol) [0x00B25F8C] (No symbol) [0x00B22328] (No symbol) [0x00B2240B] (No symbol) [0x00B14FF7] BaseThreadInitThunk [0x762B0099+25] RtlGetAppContainerNamedObjectPath [0x77A97B6E+286] RtlGetAppContainerNamedObjectPath [0x77A97B3E+238]
The linkedin html structure for job title has changed from 'span-span' to 'div-span'. So find the "span" element once is enough.
position_title = outer_positions[0].find_element(By.TAG_NAME,"span").text
Thank you very much, this indeed works for position information. However, there seems to be a change in the education section as well, which gave me similar errors. Have you encountered such issues when scraping education information?
@OnePotatoCat Awesome, figured LI's page had changed. @joeyism Notice pull requests going back for two years. Is there a faster way to merge good changes into the codebase to more rapidly address LI changes? How can the community help?
Thank you very much, this indeed works for position information. However, there seems to be a change in the education section as well, which gave me similar errors. Have you encountered such issues when scraping education information?
Same issue as the "get_experiences", the institution_name is now under 'div-span' instead of 'span-span'.
institution_name = outer_positions[0].find_element(By.TAG_NAME,"span").text
*ps I think that's it. I fixed the code a few days ago. I could be missing some other changes.
Thank you very much, this indeed works for position information. However, there seems to be a change in the education section as well, which gave me similar errors. Have you encountered such issues when scraping education information?
Same issue as the "get_experiences", the institution_name is now under 'div-span' instead of 'span-span'.
institution_name = outer_positions[0].find_element(By.TAG_NAME,"span").text
*ps I think that's it. I fixed the code a few days ago. I could be missing some other changes.
Thank you for replying! I did make these changes but still got errors. It's quite strange. Thank you very much again!
@OnePotatoCat if you can submit a MR, I'll merge it in and publish a new one
@teddis yea I get busy with my day job, but some of those are very out of date. If you ping me, I should get it
Hey y'all - running into the same issues - I've even changed the code in the person.py file directly (deleting the extra span on job title and institution name and it's still not working. Any help would be appreciated! @OnePotatoCat @teddis @joeyism
`from linkedin_scraper import Company, Person, actions from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromiumService from webdriver_manager.chrome import ChromeDriverManager from webdriver_manager.core.utils import ChromeType
driver = webdriver.Chrome(service=ChromiumService(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()))
email = "[email protected]" password = "XXXX" actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal #from linkedin_scraper import Person, actions person = Person("https://www.linkedin.com/in/collin-mclelland/", driver=driver) print(person)`
Traceback (most recent call last):
File "C:\Users\jakal\PycharmProjects\pythonProject1\main.py", line 16, in
Process finished with exit code 1
Hey y'all - running into the same issues - I've even changed the code in the person.py file directly (deleting the extra span on job title and institution name and it's still not working. Any help would be appreciated! @OnePotatoCat @teddis @joeyism
`from linkedin_scraper import Company, Person, actions from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromiumService from webdriver_manager.chrome import ChromeDriverManager from webdriver_manager.core.utils import ChromeType
driver = webdriver.Chrome(service=ChromiumService(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()))
email = "[email protected]" password = "XXXX" actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal #from linkedin_scraper import Person, actions person = Person("https://www.linkedin.com/in/collin-mclelland/", driver=driver) print(person)`
Traceback (most recent call last): File "C:\Users\jakal\PycharmProjects\pythonProject1\main.py", line 16, in person = Person("https://www.linkedin.com/in/collin-mclelland/", driver=driver) File "C:\Users\jakal\PycharmProjects\pythonProject1\venv\lib\site-packages\linkedin_scraper\person.py", line 64, in init self.scrape(close_on_complete) File "C:\Users\jakal\PycharmProjects\pythonProject1\venv\lib\site-packages\linkedin_scraper\person.py", line 89, in scrape self.scrape_logged_in(close_on_complete=close_on_complete) File "C:\Users\jakal\PycharmProjects\pythonProject1\venv\lib\site-packages\linkedin_scraper\person.py", line 285, in scrape_logged_in self.get_experiences() File "C:\Users\jakal\PycharmProjects\pythonProject1\venv\lib\site-packages\linkedin_scraper\person.py", line 131, in get_experiences position_title = outer_positions[0].find_element(By.TAG_NAME,"span").find_element(By.TAG_NAME,"span").text File "C:\Users\jakal\PycharmProjects\pythonProject1\venv\lib\site-packages\selenium\webdriver\remote\webelement.py", line 426, in find_element return self._execute(Command.FIND_CHILD_ELEMENT, {"using": by, "value": value})["value"] File "C:\Users\jakal\PycharmProjects\pythonProject1\venv\lib\site-packages\selenium\webdriver\remote\webelement.py", line 404, in _execute return self._parent.execute(command, params) File "C:\Users\jakal\PycharmProjects\pythonProject1\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 440, in execute self.error_handler.check_response(response) File "C:\Users\jakal\PycharmProjects\pythonProject1\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 245, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"tag name","selector":"span"} (Session info: chrome=114.0.5735.134) Stacktrace: Backtrace: GetHandleVerifier [0x0051A813+48355] (No symbol) [0x004AC4B1] (No symbol) [0x003B5358] (No symbol) [0x003E09A5] (No symbol) [0x003E0B3B] (No symbol) [0x003D9AE1] (No symbol) [0x003FA784] (No symbol) [0x003D9A36] (No symbol) [0x003FAA94] (No symbol) [0x0040C922] (No symbol) [0x003FA536] (No symbol) [0x003D82DC] (No symbol) [0x003D93DD] GetHandleVerifier [0x0077AABD+2539405] GetHandleVerifier [0x007BA78F+2800735] GetHandleVerifier [0x007B456C+2775612] GetHandleVerifier [0x005A51E0+616112] (No symbol) [0x004B5F8C] (No symbol) [0x004B2328] (No symbol) [0x004B240B] (No symbol) [0x004A4FF7] BaseThreadInitThunk [0x76A37D59+25] RtlInitializeExceptionChain [0x774FB74B+107] RtlClearBits [0x774FB6CF+191]
Process finished with exit code 1
Ok so I've fixed it locally - there's another spot in line 137 of person.py that you have to remove the extra ".text" on.
How do I limit the person call so that it doesn't return connections/contacts? I can do this with company by saying get_employees=False, is there an equivalent of that for persons? Thanks!