linkedin
linkedin copied to clipboard
User error?
Firstly, I love that you've made this. However, I'm having some trouble getting it to work properly and I think maybe it's just a user/documentation error. I don't quite get the:
make companies
or
make random
or
make byname
Like, which one is for what?
If I try make companies
, I get the following in the log. If I connect to the vnc, I do the security check and it passes. Then scrape logs in, gets my linkedin page and then just sits there doing nothing, which then results in scrape exiting.
Successfully built 4d01d326b5743b603e198a3c558391123e315f4965f96722a5d4b4703b967ab7
docker-compose up scrapy_companies
selenium is up-to-date
Starting linkedin_scrapy_companies_1 ... done
Attaching to linkedin_scrapy_companies_1
scrapy_companies_1 | --2023-10-10 18:38:35-- http://selenium:4444/wd/hub
scrapy_companies_1 | Resolving selenium (selenium)... 172.21.0.2
scrapy_companies_1 | Connecting to selenium (selenium)|172.21.0.2|:4444... connected.
scrapy_companies_1 | HTTP request sent, awaiting response... 302 Found
scrapy_companies_1 | Location: http://selenium:4444/wd/hub/static/resource/hub.html [following]
scrapy_companies_1 | --2023-10-10 18:38:35-- http://selenium:4444/wd/hub/static/resource/hub.html
scrapy_companies_1 | Reusing existing connection to selenium:4444.
scrapy_companies_1 | HTTP request sent, awaiting response... 200 OK
scrapy_companies_1 | Length: 160 [text/html]
scrapy_companies_1 | Saving to: ‘STDOUT’
scrapy_companies_1 |
scrapy_companies_1 | 0K 100%<!DOCTYPE html>
scrapy_companies_1 | <title>WebDriver Hub</title>
scrapy_companies_1 | <link rel="stylesheet" href="style.css">
scrapy_companies_1 | <script src="client.js"></script>
scrapy_companies_1 | <body>
scrapy_companies_1 | <script>init();</script>
scrapy_companies_1 | </body>
scrapy_companies_1 | 29.7M=0s
scrapy_companies_1 |
scrapy_companies_1 | 2023-10-10 18:38:35 (29.7 MB/s) - written to stdout [160/160]
scrapy_companies_1 |
scrapy_companies_1 | Selenium is up - executing command
scrapy_companies_1 | INFO:root:***** SECURITY CHECK IN PROGRESS *****
scrapy_companies_1 | INFO:root:Please perform the security check on selenium, you have 30 seconds...
scrapy_companies_1 | INFO:root:***** SECURITY CHECK COMPLETED *****
linkedin_scrapy_companies_1 exited with code 0
Just to be sure it wasn't something I changed, I re-checked the repo out, set my conf.py, then ran make companies
.
NB: On first run, it errors with:
scrapy_companies_1 | File "sequential_run.py", line 42, in <module>
scrapy_companies_1 | open(file_name, "w").close()
scrapy_companies_1 | FileNotFoundError: [Errno 2] No such file or directory: 'data/companies/data.csv'
so I just did a mkdir data/companies
and touch data/companies/data.csv
, then ran it again.
Now, it's "running", but the vnc just sits at the main homepage for my user, never clicks/does anything. And the log for make companies
just stays at the following, but never exits:
Recreating linkedin_scrapy_companies_1 ... done
Attaching to linkedin_scrapy_companies_1
scrapy_companies_1 | --2023-10-10 18:49:44-- http://selenium:4444/wd/hub
scrapy_companies_1 | Resolving selenium (selenium)... 172.21.0.2
scrapy_companies_1 | Connecting to selenium (selenium)|172.21.0.2|:4444... connected.
scrapy_companies_1 | HTTP request sent, awaiting response... 302 Found
scrapy_companies_1 | Location: http://selenium:4444/wd/hub/static/resource/hub.html [following]
scrapy_companies_1 | --2023-10-10 18:49:44-- http://selenium:4444/wd/hub/static/resource/hub.html
scrapy_companies_1 | Reusing existing connection to selenium:4444.
scrapy_companies_1 | HTTP request sent, awaiting response... 200 OK
scrapy_companies_1 | Length: 160 [text/html]
scrapy_companies_1 | Saving to: ‘STDOUT’
scrapy_companies_1 | <!DOCTYPE html>
scrapy_companies_1 | <title>WebDriver Hub</title>
scrapy_companies_1 | <link rel="stylesheet" href="style.css">
scrapy_companies_1 | <script src="client.js"></script>
scrapy_companies_1 | <body>
scrapy_companies_1 | <script>init();</script>
scrapy_companies_1 | </body>
scrapy_companies_1 |
scrapy_companies_1 | 0K 100% 32.7M=0s
scrapy_companies_1 |
scrapy_companies_1 | 2023-10-10 18:49:44 (32.7 MB/s) - written to stdout [160/160]
scrapy_companies_1 |
scrapy_companies_1 | Selenium is up - executing command
Mine does the exact same thing. I've put the company URL in the "companies.txt" file, and nothing. Nothing happens via VNC/etc.