linkedin_scraper icon indicating copy to clipboard operation
linkedin_scraper copied to clipboard

Profile scraping error: res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*")

Open Kiru6ik opened this issue 1 year ago • 8 comments

When scraping the person who worked multiple times at organization this error occurred. I checked the page structure and it should work fine but for some reason it fails. This part of code causes the problem: if position_summary_text and len(position_summary_text.find_element(By.CLASS_NAME,"pvs-list").find_elements(By.XPATH,"li")) > 1: #.find_element(By.CLASS_NAME,"pvs-list") descriptions = position_summary_text.find_element(By.CLASS_NAME,"pvs-list").find_element(By.CLASS_NAME,"pvs-list").find_elements(By.XPATH,"li") for description in descriptions: res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*") position_title_elem = res[0] if len(res) > 0 else None work_times_elem = res[1] if len(res) > 1 else None location_elem = res[2] if len(res) > 2 else None it cant find res by tag name a. As far as I understood it tries to find the top part of the job description(title, duration at position, location) and all this is located under a tag on the web page. @joeyism do you have any insights on that? Am I referring correctly to the part of the page that this code is trying to analyse?

The whole error message: `Traceback (most recent call last): File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\lists_check.py", line 23, in person.scrape(close_on_complete=False) File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\linkedin_scraper\person.py", line 89, in scrape self.scrape_logged_in(close_on_complete=close_on_complete) File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\linkedin_scraper\person.py", line 285, in scrape_logged_in self.get_experiences() File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\linkedin_scraper\person.py", line 156, in get_experiences res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*") File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\webelement.py", line 417, in find_element return self._execute(Command.FIND_CHILD_ELEMENT, {"using": by, "value": value})["value"] File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\webelement.py", line 395, in _execute return self._parent.execute(command, params) File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 346, in execute self.error_handler.check_response(response) File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 245, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"tag name","selector":"a"} (Session info: chrome=114.0.5735.134); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception Stacktrace: Backtrace: GetHandleVerifier [0x0025A813+48355] (No symbol) [0x001EC4B1] (No symbol) [0x000F5358] (No symbol) [0x001209A5] (No symbol) [0x00120B3B] (No symbol) [0x00119AE1] (No symbol) [0x0013A784] (No symbol) [0x00119A36] (No symbol) [0x0013AA94] (No symbol) [0x0014C922] (No symbol) [0x0013A536] (No symbol) [0x001182DC] (No symbol) [0x001193DD] GetHandleVerifier [0x004BAABD+2539405] GetHandleVerifier [0x004FA78F+2800735] GetHandleVerifier [0x004F456C+2775612] GetHandleVerifier [0x002E51E0+616112] (No symbol) [0x001F5F8C] (No symbol) [0x001F2328] (No symbol) [0x001F240B] (No symbol) [0x001E4FF7] BaseThreadInitThunk [0x762B0099+25] RtlGetAppContainerNamedObjectPath [0x77A97B6E+286] RtlGetAppContainerNamedObjectPath [0x77A97B3E+238] (No symbol) [0x00000000]

Process finished with exit code 1 `

Kiru6ik avatar Jun 26 '23 21:06 Kiru6ik

Can you provide the code that you've used please?

joeyism avatar Jun 26 '23 21:06 joeyism

Sorry, I forgot to include the failing account at the first place. This bug occurred at this profile: https://www.linkedin.com/in/sheanahamill/. Error occurs at any basic person scraping. This is a code I used to discover this bug

from selenium.common.exceptions import WebDriverException
from selenium import webdriver
from linkedin_scraper import Person, actions, Company
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
import time, pickle
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("user-data-dir=C:\\Users\\User\\AppData\\Local\\Google\\Chrome\\User Data\\Profile 3")


driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=options)

person=Person("https://www.linkedin.com/in/sheanahamill", driver=driver, scrape=False)
time.sleep(3)
person.scrape(close_on_complete=False)

name=person.name
title=person.job_title
now_company=person.company
print(name, title, now_company)
experience=person.experiences
print(experience)
current_company=experience[0]
print(current_company)
link_to_company=current_company.linkedin_url
print(link_to_company)
location=current_company.location
print(location)


company=Company(link_to_company, driver=driver, get_employees=False, close_on_complete=False)

company_name=company.name
company_size=company.company_size
company_website=company.website
about=company.about_us
print(company_name, company_size, company_website, about)

this code works fine with other account(other that log1 problem from #173)

Kiru6ik avatar Jun 26 '23 21:06 Kiru6ik

Hey - I only updated two functions as I needed: get_experiences() and get_name_and_location(). In addition to UI updates I also fixed the scraper issue where it gets confused when a person has multiple positions at the same company over time.

You can selectively scrape by doing this: person=Person("https://www.linkedin.com/in/sheanahamill", driver=driver, scrape=False) person.get_experiences() print(person.experiences)

def get_name_and_location(self):
        main = self.wait_for_element_to_load(by=By.TAG_NAME, name="main")
        top_panels = main.find_elements(By.CLASS_NAME,"pv-text-details__left-panel")
        self.name = top_panels[0].find_elements(By.XPATH,"*")[0].text
        self.location = top_panels[1].find_element(By.TAG_NAME,"span").text

def get_experiences(self): # modified
        url = os.path.join(self.linkedin_url, "details/experience")
        self.driver.get(url)
        self.focus()
        main = self.wait_for_element_to_load(by=By.TAG_NAME, name="main")
        self.scroll_to_half()
        self.scroll_to_bottom()
        main_list = self.wait_for_element_to_load(name="pvs-list", base=main)
        for position in main_list.find_elements(By.XPATH,"li"):
            position = position.find_element(By.CLASS_NAME,"pvs-entity")
            company_logo_elem, position_details = position.find_elements(By.XPATH,"*")

            # company elem
            company_linkedin_url = company_logo_elem.find_element(By.XPATH,"*").get_attribute("href")

            # position details
            position_details_list = position_details.find_elements(By.XPATH,"*")
            position_summary_details = position_details_list[0] if len(position_details_list) > 0 else None
            position_summary_text = position_details_list[1] if len(position_details_list) > 1 else None # skills OR list of positions
            outer_positions = position_summary_details.find_element(By.XPATH,"*").find_elements(By.XPATH,"*")

            if len(outer_positions) == 4:
                position_title = outer_positions[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].text
                company = outer_positions[1].find_element(By.TAG_NAME,"span").text
                work_times = outer_positions[2].find_element(By.TAG_NAME,"span").text
                location = outer_positions[3].find_element(By.TAG_NAME,"span").text
            elif len(outer_positions) == 3:
                if "·" in outer_positions[2].text:
                    position_title = outer_positions[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].text
                    company = outer_positions[1].find_element(By.TAG_NAME,"span").text
                    work_times = outer_positions[2].find_element(By.TAG_NAME,"span").text
                    location = ""
                else:
                    position_title = ""
                    company = outer_positions[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].text
                    work_times = outer_positions[1].find_element(By.TAG_NAME,"span").text
                    location = outer_positions[2].find_element(By.TAG_NAME,"span").text

            elif len(outer_positions) == 2: # this is for when person has multiple pos over time at one company
                company_div, work_times_div = outer_positions
                company = company_div.find_element(By.TAG_NAME,"span").text
                company_linkedin_url = ""
                print(colored(company, 'yellow'))

                positions_list = position_summary_text.find_element(By.CLASS_NAME, "pvs-list").find_element(By.CLASS_NAME, "pvs-list")

                for position in positions_list.find_elements(By.XPATH,"*"):
                    print(colored('count position', "yellow"))
                    position = position.find_element(By.CLASS_NAME,"pvs-entity")
                    position_details_list = position.find_elements(By.XPATH,"*")[1].find_elements(By.XPATH,"*")

                    position_summary_details = position_details_list[0] if len(position_details_list) > 0 else None
                    position_summary_text = position_details_list[1] if len(position_details_list) > 1 else None # skills OR list of positions
                    outer_positions = position_summary_details.find_element(By.XPATH,"*").find_elements(By.XPATH,"*")

                    if len(outer_positions) == 3:
                        position_title = outer_positions[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].text
                        print(colored(position_title, 'yellow'))
                        work_times = outer_positions[1].find_element(By.TAG_NAME,"span").text
                        location = outer_positions[2].find_element(By.TAG_NAME,"span").text
                    else:
                        print('need fix.')

                    if 'work_times' not in locals() and 'work_times' not in globals():
                        work_times = None # modified
                    times = work_times.split("·")[0].strip() if work_times else ""
                    duration = work_times.split("·")[1].strip() if times != "" and len(work_times.split("·")) > 1 else None # modified

                    from_date = " ".join(times.split(" ")[:2]) if times else ""
                    to_date = " ".join(times.split(" ")[3:]) if times else ""

                    if position_summary_text and len(position_summary_text.find_element(By.CLASS_NAME,"pvs-list").find_element(By.CLASS_NAME,"pvs-list").find_elements(By.XPATH,"li")) > 1:
                        descriptions = position_summary_text.find_element(By.CLASS_NAME,"pvs-list").find_element(By.CLASS_NAME,"pvs-list").find_elements(By.XPATH,"li")
                        for description in descriptions:
                            res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*")
                            position_title_elem = res[0] if len(res) > 0 else None
                            work_times_elem = res[1] if len(res) > 1 else None
                            location_elem = res[2] if len(res) > 2 else None

                            location = location_elem.find_element(By.XPATH,"*").text if location_elem else None
                            position_title = position_title_elem.find_element(By.XPATH,"*").find_element(By.TAG_NAME,"*").text if position_title_elem else ""
                            work_times = work_times_elem.find_element(By.XPATH,"*").text if work_times_elem else ""
                            times = work_times.split("·")[0].strip() if work_times else ""
                            duration = work_times.split("·")[1].strip() if len(work_times.split("·")) > 1 else None
                            from_date = " ".join(times.split(" ")[:2]) if times else ""
                            to_date = " ".join(times.split(" ")[3:]) if times else ""

                            experience = Experience(
                                position_title=position_title,
                                from_date=from_date,
                                to_date=to_date,
                                duration=duration,
                                location=location,
                                description=description,
                                institution_name=company if 'company' in locals() or 'company' in globals() else "Not provided", #modified
                                linkedin_url=company_linkedin_url
                            )
                            self.add_experience(experience)
                    else:
                        description = position_summary_text.text if position_summary_text else ""

                        experience = Experience(
                            position_title=position_title,
                            from_date=from_date,
                            to_date=to_date,
                            duration=duration,
                            location=location,
                            description=description,
                            institution_name=company,
                            linkedin_url=company_linkedin_url
                        )
                        self.add_experience(experience)
                return


            if 'work_times' not in locals() and 'work_times' not in globals():
                work_times = None
            times = work_times.split("·")[0].strip() if work_times else ""
            duration = work_times.split("·")[1].strip() if times != "" and len(work_times.split("·")) > 1 else None

            from_date = " ".join(times.split(" ")[:2]) if times else ""
            to_date = " ".join(times.split(" ")[3:]) if times else ""

            if position_summary_text and len(position_summary_text.find_element(By.CLASS_NAME,"pvs-list").find_element(By.CLASS_NAME,"pvs-list").find_elements(By.XPATH,"li")) > 1:
                descriptions = position_summary_text.find_element(By.CLASS_NAME,"pvs-list").find_element(By.CLASS_NAME,"pvs-list").find_elements(By.XPATH,"li")
                for description in descriptions:
                    res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*")
                    position_title_elem = res[0] if len(res) > 0 else None
                    work_times_elem = res[1] if len(res) > 1 else None
                    location_elem = res[2] if len(res) > 2 else None

                    location = location_elem.find_element(By.XPATH,"*").text if location_elem else None
                    position_title = position_title_elem.find_element(By.XPATH,"*").find_element(By.TAG_NAME,"*").text if position_title_elem else ""
                    work_times = work_times_elem.find_element(By.XPATH,"*").text if work_times_elem else ""
                    times = work_times.split("·")[0].strip() if work_times else ""
                    duration = work_times.split("·")[1].strip() if len(work_times.split("·")) > 1 else None
                    from_date = " ".join(times.split(" ")[:2]) if times else ""
                    to_date = " ".join(times.split(" ")[3:]) if times else ""

                    experience = Experience(
                        position_title=position_title,
                        from_date=from_date,
                        to_date=to_date,
                        duration=duration,
                        location=location,
                        description=description,
                        institution_name=company if 'company' in locals() or 'company' in globals() else "Not provided",
                        linkedin_url=company_linkedin_url
                    )
                    self.add_experience(experience)
            else:
                description = position_summary_text.text if position_summary_text else ""

                experience = Experience(
                    position_title=position_title,
                    from_date=from_date,
                    to_date=to_date,
                    duration=duration,
                    location=location,
                    description=description,
                    institution_name=company,
                    linkedin_url=company_linkedin_url
                )
                self.add_experience(experience)

This is from ~ a week ago, hopefully still working.

khamamoto6 avatar Jun 28 '23 21:06 khamamoto6

Still facing same issue even with this update

Kiru6ik avatar Jul 04 '23 18:07 Kiru6ik

I just deployed a fix. Please try with v2.11.2 please

joeyism avatar Jul 04 '23 20:07 joeyism

Thanks it works, I tested it on 2 profiles but havent tested at scale yet. I am new to git; I dont know how to submit a pr but the company.py doesnt work either. Updates needed are:

  1. Change class name to mb6 in line 210: grid = driver.find_element(By.CLASS_NAME, "mb6") # used to be artdeco-card.p5.mb4
  2. Change class name to mb1 in line 241: grid = driver.find_element(By.CLASS_NAME, "mb1") # used to be mt1

And now it works for me

Kiru6ik avatar Jul 04 '23 21:07 Kiru6ik

Thanks it works, I tested it on 2 profiles but havent tested at scale yet. I am new to git; I dont know how to submit a pr but the company.py doesnt work either. Updates needed are:

  1. Change class name to mb6 in line 210: grid = driver.find_element(By.CLASS_NAME, "mb6") # used to be artdeco-card.p5.mb4
  2. Change class name to mb1 in line 241: grid = driver.find_element(By.CLASS_NAME, "mb1") # used to be mt1

And now it works for me

hi it is not working i have changed as it shown properties "https://www.linkedin.com/company/google" i was checking

arpit5292 avatar Aug 16 '23 06:08 arpit5292

The way I troubleshooted it is:

  1. Try to identify the part of the scraping that is failing
  2. See the error
  3. Try to understand what it is doing and what its function
  4. Find the block that this part is trying to find(might be challenging as sometimes its not clear)
  5. Find the new element name etc You can send the full error message and I can try helping out

Kiru6ik avatar Aug 16 '23 15:08 Kiru6ik