undetectable-crawler
undetectable-crawler copied to clipboard
A Node.js script powered by Puppeteer for undetectable web scraping
Undetectable Crawler
This is a Node.js script that leverages Puppeteer with extra settings to create a web crawler that avoids detection. This tool allows you to scrape websites while minimizing the risk of being blocked or identified as a bot.
Features
- Bypasses common bot detection mechanisms.
- Customizable settings for stealthy web scraping.
- Easily extensible for your specific scraping needs.
Chrome’s Headless mode gets an upgrade
- https://developer.chrome.com/docs/chromium/new-headless
Proxy
Please note that it is essential to use a reliable residential proxy list, such as the one available at BrightData, to ensure smooth and efficient web crawling while minimizing the risk of IP bans and detection
Installation using Docker
- Clone this repository:
git clone [email protected]:darkotodoric/undetectable-crawler.git
cd undetectable-crawler
- Build the Docker image
docker-compose build
- Install npm packages
docker-compose run --rm undetectable-nodejs-service npm install
- Run the crawler
docker-compose run --rm undetectable-nodejs-service node crawler.js https://bot.sannysoft.com/
Contributing
Contributions are welcome! Feel free to open issues or submit pull requests to improve this project.