linkedin icon indicating copy to clipboard operation
linkedin copied to clipboard

User error?

Open claytondukes opened this issue 1 year ago • 2 comments

Firstly, I love that you've made this. However, I'm having some trouble getting it to work properly and I think maybe it's just a user/documentation error. I don't quite get the:

make companies
or
make random
or
make byname

Like, which one is for what? If I try make companies, I get the following in the log. If I connect to the vnc, I do the security check and it passes. Then scrape logs in, gets my linkedin page and then just sits there doing nothing, which then results in scrape exiting.

Successfully built 4d01d326b5743b603e198a3c558391123e315f4965f96722a5d4b4703b967ab7
docker-compose up scrapy_companies
selenium is up-to-date
Starting linkedin_scrapy_companies_1 ... done
Attaching to linkedin_scrapy_companies_1
scrapy_companies_1  | --2023-10-10 18:38:35--  http://selenium:4444/wd/hub
scrapy_companies_1  | Resolving selenium (selenium)... 172.21.0.2
scrapy_companies_1  | Connecting to selenium (selenium)|172.21.0.2|:4444... connected.
scrapy_companies_1  | HTTP request sent, awaiting response... 302 Found
scrapy_companies_1  | Location: http://selenium:4444/wd/hub/static/resource/hub.html [following]
scrapy_companies_1  | --2023-10-10 18:38:35--  http://selenium:4444/wd/hub/static/resource/hub.html
scrapy_companies_1  | Reusing existing connection to selenium:4444.
scrapy_companies_1  | HTTP request sent, awaiting response... 200 OK
scrapy_companies_1  | Length: 160 [text/html]
scrapy_companies_1  | Saving to: ‘STDOUT’
scrapy_companies_1  |
scrapy_companies_1  |      0K                                                       100%<!DOCTYPE html>
scrapy_companies_1  | <title>WebDriver Hub</title>
scrapy_companies_1  | <link rel="stylesheet" href="style.css">
scrapy_companies_1  | <script src="client.js"></script>
scrapy_companies_1  | <body>
scrapy_companies_1  | <script>init();</script>
scrapy_companies_1  | </body>
scrapy_companies_1  |  29.7M=0s
scrapy_companies_1  |
scrapy_companies_1  | 2023-10-10 18:38:35 (29.7 MB/s) - written to stdout [160/160]
scrapy_companies_1  |
scrapy_companies_1  | Selenium is up - executing command
scrapy_companies_1  | INFO:root:***** SECURITY CHECK IN PROGRESS *****
scrapy_companies_1  | INFO:root:Please perform the security check on selenium, you have 30 seconds...
scrapy_companies_1  | INFO:root:***** SECURITY CHECK COMPLETED *****
linkedin_scrapy_companies_1 exited with code 0

claytondukes avatar Oct 10 '23 18:10 claytondukes

Just to be sure it wasn't something I changed, I re-checked the repo out, set my conf.py, then ran make companies. NB: On first run, it errors with:

scrapy_companies_1  |   File "sequential_run.py", line 42, in <module>
scrapy_companies_1  |     open(file_name, "w").close()
scrapy_companies_1  | FileNotFoundError: [Errno 2] No such file or directory: 'data/companies/data.csv'

so I just did a mkdir data/companies and touch data/companies/data.csv, then ran it again.

Now, it's "running", but the vnc just sits at the main homepage for my user, never clicks/does anything. And the log for make companies just stays at the following, but never exits:

Recreating linkedin_scrapy_companies_1 ... done
Attaching to linkedin_scrapy_companies_1
scrapy_companies_1  | --2023-10-10 18:49:44--  http://selenium:4444/wd/hub
scrapy_companies_1  | Resolving selenium (selenium)... 172.21.0.2
scrapy_companies_1  | Connecting to selenium (selenium)|172.21.0.2|:4444... connected.
scrapy_companies_1  | HTTP request sent, awaiting response... 302 Found
scrapy_companies_1  | Location: http://selenium:4444/wd/hub/static/resource/hub.html [following]
scrapy_companies_1  | --2023-10-10 18:49:44--  http://selenium:4444/wd/hub/static/resource/hub.html
scrapy_companies_1  | Reusing existing connection to selenium:4444.
scrapy_companies_1  | HTTP request sent, awaiting response... 200 OK
scrapy_companies_1  | Length: 160 [text/html]
scrapy_companies_1  | Saving to: ‘STDOUT’
scrapy_companies_1  | <!DOCTYPE html>
scrapy_companies_1  | <title>WebDriver Hub</title>
scrapy_companies_1  | <link rel="stylesheet" href="style.css">
scrapy_companies_1  | <script src="client.js"></script>
scrapy_companies_1  | <body>
scrapy_companies_1  | <script>init();</script>
scrapy_companies_1  | </body>
scrapy_companies_1  |
scrapy_companies_1  |      0K                                                       100% 32.7M=0s
scrapy_companies_1  |
scrapy_companies_1  | 2023-10-10 18:49:44 (32.7 MB/s) - written to stdout [160/160]
scrapy_companies_1  |
scrapy_companies_1  | Selenium is up - executing command

claytondukes avatar Oct 10 '23 18:10 claytondukes

Mine does the exact same thing. I've put the company URL in the "companies.txt" file, and nothing. Nothing happens via VNC/etc.

raithedavion avatar Dec 19 '23 18:12 raithedavion