linkedin_scraper icon indicating copy to clipboard operation
linkedin_scraper copied to clipboard

Scraping from a list of links

Open HABER7789 opened this issue 1 year ago • 8 comments

Hi, I am really stunned by the scraper you have built and really glad to be able to use it. I am facing an issue in scraping a list of people from an excel file that basically just has links.

The scraper starts scraping the first link, and then after scraping one link, it does manage to go to the other profile as I can view from chrome window, but it throws an exception and is unable to scrape further, giving me the data that was scraped for only one person in the beginning.

I would really appreciate your help in this, attaching my code here.

from linkedin_scraper import Person, actions
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
#from selenium.webdriver.chrome.service import Service
import pandas as pd
import openpyxl

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path='C:\chromedriver.exe')
driver.set_window_size(1920, 1080)

email = "Email"
password = "password"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal

dataframe1 = pd.read_excel('People.xlsx') 
links = list(dataframe1['PeopleLinks'])

ExtractedList = []

for i in links:    
    person = Person(i, driver=driver, scrape=False)
    person.scrape(close_on_complete=False)
    ExtractedList.append(person)


for j in ExtractedList:
    print(j)


HABER7789 avatar Mar 29 '23 10:03 HABER7789

What's the error that you get?

joeyism avatar Mar 29 '23 11:03 joeyism

What's the error that you get?

Hey there! image

HABER7789 avatar Mar 29 '23 11:03 HABER7789

What's the error that you get?

Hey there! ![image](https://user-images.githubusercontent.com/124895699/228524018-54d53647-b8f4-4e00-a115-d4e763f9bed

What's the error that you get?

This is the error I am getting, issue here is, if it is scraping one person, it should do the same thing for the other right? Please do correct me anywhere if I am incorrect. Thanks!

HABER7789 avatar Mar 29 '23 11:03 HABER7789

Hey! I'm also getting the same error; is it just that the css-selector has changed?

rizwankaz avatar Apr 12 '23 21:04 rizwankaz

#158 This PR solves this issue and can parse multiple person links.

lusifer021 avatar Apr 18 '23 20:04 lusifer021

#158 This PR solves this issue and can parse multiple person links.

Thanks a ton!!!!!!!!!!!, it works, really appreciate your help here. Cheers man ! @lusifer021

HABER7789 avatar Apr 19 '23 04:04 HABER7789

#158 This PR solves this issue and can parse multiple person links.

Thanks a ton!!!!!!!!!!!, it works, really appreciate your help here. Cheers man ! @lusifer021

Welcome @HABER7789

lusifer021 avatar Apr 19 '23 05:04 lusifer021

@joeyism i'm doing this as well and I wanted to ask how to have it exclude scraping the people in the company scrape? My current code is below but wanted to ask since I've got a long list of companies and i don't need the employees piece. Let me know.

`import pandas as pd from linkedin_scraper import Person, Company, actions from selenium import webdriver from selenium.webdriver.chrome.service import Service ser = Service(r"c:\se\chromedriver.exe") op = webdriver.ChromeOptions() driver = webdriver.Chrome(service=ser, options=op)

email = "[email protected]" password = "XXXXXXXXXXX" actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal

dataframe1 = pd.read_csv("company_Linkedin_upload.csv') links = list(dataframe1['linkedin url'])

ExtractedList = []

for i in links: company = Company(i, driver=driver, scrape=False, get_employees=False) company.scrape(close_on_complete=False) ExtractedList.append(company) print(company)

for j in ExtractedList: print(j)`

jakalfayan avatar Apr 25 '23 18:04 jakalfayan