scrapedin-linkedin-crawler linkedin website changed and can not read basic data

linkedin website changed and can not read basic data

Open cyanide2019 opened this issue 4 years ago • 11 comments

inished scraping url: https://www.linkedin.com/in/inmudassar-iqbal-a9a9159b/ scrapedin: 2020-01-22T07:42:56.489Z error: [cleanMessageData] LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues 2020-01-22T07:42:56.490Z error: error on crawling profile: https://linkedin/in/mudassar-iqbal-a9a9159b/ Error: LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues 2020-01-22T07:42:56.830Z info: starting scraping: https://linkedin/in/nadeem-aslam-057341102/ scrapedin: 2020-01-22T07:42:56.830Z info: [profile] starting scraping url: https://www.linkedin.com/in/innadeem-aslam-057341102/ scrapedin: 2020-01-22T07:42:58.070Z info: [profile] finished scraping url: https://www.linkedin.com/in/inamjad-khan-a03634b7/ scrapedin: 2020-01-22T07:42:58.070Z error: [cleanMessageData] LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues 2020-01-22T07:42:58.070Z error: error on crawling profile: https://linkedin/in/amjad-khan-a03634b7/ Error: LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues 2020-01-22T07:42:58.832Z info: starting scraping: https://linkedin/in/baraa-faisal-0529a5a3/ scrapedin: 2020-01-22T07:42:58.833Z info: [profile] starting scraping url: https://www.linkedin.com/in/inbaraa-faisal-0529a5a3/

Jan 22 '20 07:01 cyanide2019

020-01-23T02:23:36.378Z error: error on crawling profile: https://linkedin.com/in/ahmad-abdelqader-pmp-osha-iso-70493882/ Error: EACCES: permission denied, open './crawledProfiles/ahmad-abdelqader-pmp-osha-iso-70493882.json' scrapedin: 2020-01-23T02:23:36.555Z info: [profile] finished scraping url: https://www.linkedin.com/in/ibrahim-saadeddine-1320b8100 2020-01-23T02:23:36.556Z error: error on crawling profile: https://linkedin.com/in/ibrahim-saadeddine-1320b8100/ Error: EACCES: permission denied, open './crawledProfiles/ibrahim-saadeddine-1320b8100.json' 2020-01-23T02:23:36.959Z info: starting scraping: https://linkedin.com/in/usman-mohammed-41332845/ scrapedin: 2020-01-23T02:23:36.959Z info: [profile] starting scraping url: https://www.linkedin.com/in/usman-mohammed-41332845 2020-01-23T02:23:37.960Z info: starting scraping: https://linkedin.com/in/smfaisal29/ scrapedin: 2020-01-23T02:23:37.960Z info: [profile] starting scraping url: https://www.linkedin.com/in/smfaisal29 scrapedin: 2020-01-23T02:23:41.554Z info: [profile] scrolling page to the bottom scrapedin: 2020-01-23T02:23:42.066Z info: [scrollToPageBottom] scrolling to page bottom (1) scrapedin: 2020-01-23T02:23:42.624Z info: [scrollToPageBottom] scrolling to page bottom (2) scrapedin: 2020-01-23T02:23:42.988Z info: [profile] applying 1st delay

Jan 23 '20 02:01 cyanide2019

Same problem

Feb 11 '20 03:02 Zackhardtoname

@Zackhardtoname Are you using a company/recruiter profile to login or just a regular employee one?

Please set isHeadless to false on config.json , this will open the browser while crawling, please check if it's really logged (looking on the LinkedIn top bar)

And also confirm that's 1.0.11 scrapedin version on your package.json.

@cyanide2019 could you do the same please? I couldn't reproduce this error, it's working here, thanks.

Feb 11 '20 15:02 leonardiwagner

Regular employee

On Tue, Feb 11, 2020, 10:18 AM Wagner Leonardi [email protected] wrote:

@Zackhardtoname https://github.com/Zackhardtoname Are you using a company/recruiter profile to login or just a regular employee one?

Please set isHeadless to false on config.json , this will open the browser while crawling, please check if it's really logged (looking on the LinkedIn top bar)

And also confirm that's 1.0.11 scrapedin version on your package.json.

@cyanide2019 https://github.com/cyanide2019 could you do the same please? I couldn't reproduce this error, it's working here, thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/linkedtales/scrapedin-linkedin-crawler/issues/36?email_source=notifications&email_token=AGF32XOSGNRGPIKCY6PVASLRCK6UJA5CNFSM4KKBGSU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELMZVSQ#issuecomment-584686282, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGF32XMTETLAIOFZK5A4CDLRCK6UJANCNFSM4KKBGSUQ .

Feb 11 '20 15:02 Zackhardtoname

@Zackhardtoname so please do the mentioned configurations and post the results here when you can.

Feb 11 '20 15:02 leonardiwagner

yes, it worked for me , now the issue is , when I trying to gather the profile links from linkedin , they will send me warning and block my account and warning permanent blocking if I continue to send auto query something , how to bypass this mechanism ?

On Tue, Feb 11, 2020 at 7:18 AM Wagner Leonardi [email protected] wrote:

@Zackhardtoname https://github.com/Zackhardtoname Are you using a company/recruiter profile to login or just a regular employee one?

Please set isHeadless to false on config.json , this will open the browser while crawling, please check if it's really logged (looking on the LinkedIn top bar)

And also confirm that's 1.0.11 scrapedin version on your package.json.

@cyanide2019 https://github.com/cyanide2019 could you do the same please? I couldn't reproduce this error, it's working here, thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/linkedtales/scrapedin-linkedin-crawler/issues/36?email_source=notifications&email_token=AM2OZKQMSPOAIQVMFEXTRWDRCK6UJA5CNFSM4KKBGSU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELMZVSQ#issuecomment-584686282, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2OZKUMAOW7LJD7IZKWW3TRCK6UJANCNFSM4KKBGSUQ .

Feb 11 '20 18:02 cyanide2019

@cyanide2019 Where you able to find a solution for this?

Apr 08 '20 14:04 Aditya94A

What is the use of "rootProfiles": [ "https://www.linkedin.com/in/place/", "https://www.linkedin.com/in/here/", "https://www.linkedin.com/in/profiles/", "https://www.linkedin.com/in/to-start-the-crawler/" ] in config.json?

Also, I want to search the profiles based on some particular keywords, but "relatedProfilesKeywords": ["javascript"], does not seems to work.

May 15 '20 11:05 pushparmar

@cyanide2019 Is there any way that I can use particular keywords and then the crawler can search through all available profiles based on those keywords only?

May 15 '20 13:05 PriyaJainDev

It's a little hard to follow what was happening here, but I think I had the same problem. Login from credentials doesn't work with headless, but everything works fine with the "headed" browser. Headless works fine with cookies for me though.

I suspect that they might just be checking the user-agent in the header and refusing to log you in or giving you a captcha if it says that it's headless. I might do some experimentation there if I find I need headless login.

May 27 '20 14:05 ThomasProctor

If I get the time, I'll do some more experimentation and open a separate issue if I really have a diagnosable problem.

May 27 '20 14:05 ThomasProctor

scrapedin-linkedin-crawler scrapedin-linkedin-crawler copied to clipboard

linkedin website changed and can not read basic data

scrapedin-linkedin-crawler
scrapedin-linkedin-crawler copied to clipboard