[Crawler + Scraper] LinkedIn Public Directory Companies

Prerequisites

Python 3.7 sudo apt-get install python3.7
Pip sudo apt-get install python3-pip
VirtualEnv sudo pip3 install virtualenv
MongoDB with collections linkedin_companies, linkedin_crawlers and linkedin_scrapers
Writing permission in the app directory to save cookies

Considerations

To run the crawler and scraper scalably, you will need to use a residential proxies server.

Installation

Clone the project:

git clone [email protected]:robertoarruda/linkedin-public-dir-companies.git

Enter the project directory:

cd ./linkedin-public-dir-companies

Create the Environment:

Within the project root, run the command below:

virtualenv venv --python=python3.7

Activate the environment:

Run the command below to enable:

source venv/bin/activate

Install dependencies:

Run the command below to install the project dependencies:

pip install -r requirements.txt

Configure MongoDB

Enter the connection settings with the database in the client_db.py file.

class ClientDB():
    __MONGO = 'mongodb://root:[email protected]:80'

[Opcional step] Setting residential proxy

Enter the host of your residential proxies server in the main.py file.

class Main():
    __PROXIES = {
        'http': 'http://127.0.0.1:80'
    }

Execute the crawler:

Execute the command below to run the crawler:

python main.py crawler

The crawler data is saved in the linkedin_crawlers collection. The crawled companies are saved in the linkedin_companies collection.

Execute the scraper:

Execute the command below to run the scraper:

python main.py scraper

The scraper data is saved in the linkedin_scrapers collection. The scraped companies are updated in the collection linkedin_companies.

Turn off the environment:

Execute the command below to deactivate:

deactivate

linkedin-public-dir-companies
linkedin-public-dir-companies copied to clipboard

Metadata

[Crawler + Scraper] LinkedIn Public Directory Companies

Prerequisites

Considerations

Installation

Clone the project:

Enter the project directory:

Create the Environment:

Activate the environment:

Install dependencies:

Configure MongoDB

[Opcional step] Setting residential proxy

Execute the crawler:

Execute the scraper:

Turn off the environment:

← Metadata

Owner

Metadata

linkedin-public-dir-companies linkedin-public-dir-companies copied to clipboard

Metadata

[Crawler + Scraper] LinkedIn Public Directory Companies

Prerequisites

Considerations

Installation

Clone the project:

Enter the project directory:

Create the Environment:

Activate the environment:

Install dependencies:

Configure MongoDB

[Opcional step] Setting residential proxy

Execute the crawler:

Execute the scraper:

Turn off the environment:

← Metadata

Owner

Metadata

linkedin-public-dir-companies
linkedin-public-dir-companies copied to clipboard