foundation
foundation copied to clipboard
𧱠A uniform template to use as a foundation for Puppeteer bot construction.
𧱠Foundation - Puppeteer Bot Starter Kit
Update:
Currently working on https://masqueradejs.com to replace this project as it is quite a bit out of date now, but in the mean time you can check out https://github.com/clouedoc/puppeteer-boiler which is similar and actively updated. πΎ
What it is?
Foundation
is intended as a simple entry-point / template for developers new to designing Puppeteer bots.
It uses the (in)famous Puppeteer-Extra package as the primary Puppeteer
driver to enable its library of Stealth
plugins and evasions.
π PS: If you're working on botting and looking for a great developer community, check out the
Puppeteer-Extra
Discord server: https://discord.gg/vz7PeKk
Foundation
tries to avoid wrapping existing libraries and does not "add" much that doesn't already exist, but starting a new project with an unfamiliar library can come with a lot of questions around project structure and tooling.
This attempts to solve these issues with a ready-to-go scaffolding, however it should be noted that the structure is just, like, my opinion man... and considered under heavy flux.
However, breaking changes shouldn't matter, because its only intended as a starting point and you should take it in whatever direction makes sense.
"Ok, but I've come from Selenium / Python?"
If you're new to both modern JavaScript (ES6
& TypeScript
) and Puppeteer
, here's a quick rundown:
π Newbie Guide To Scraping With Puppeteer
Installation
β Note for Windows users: This project does not include
cross-env
, so using WSL and Terminal Preview are essentially a requirement.
π¬ Download and init
Automatic
$ git clone https://github.com/prescience-data/foundation.git && cd ./foundation # Clone the project
$ npm run init
Manual
The automatic version runs the following commands:
$ git clone https://github.com/prescience-data/foundation.git && cd ./foundation # Clone the project
$ npm run update # Updates the package.json file dependencies to latest versions
$ npm install --loglevel=error # Installs dependencies
$ npm run db:init # Initialises a sqlite database
$ npm run build:clean # Build the TypeScript code
π¨βπ§ Configure
Edit the .env
to your liking and add any services like Google Cloud Logging
etc.
β Remember to
.gitignore
andgit -rm -rf
your.env
file before committing to any public repositories.
β· Build / Run
The project is TypeScript so there are a few commands provided for this.
$ npm run build:clean # Just build the TypeScript files
or...
$ npm run bot # Builds the app and runs your entrypoint file
data:image/s3,"s3://crabby-images/f4b1d/f4b1df380778b0e23b48cd8576487c98168e477d" alt="Run it!"
Project Structure
The project is split into two distinct parts, core
and app
.
This allows you to develop a quasi-framework that you can re-use between projects in the Core
concern, while keeping all project-specific code within the App
concern.
π Config
core/config.ts
.env
The project uses a .env
in the root to define most of the common environment variables, but you can call these from a database etc if you prefer.
The main Puppeteer LaunchOptions
are defined in the config.ts
file.
π€ Bot
app/bot.ts
Main self-executing function entry-point.
This is where you execute each part of your scoped logic from the modules
section cleanly.
Make some magic happen π§β¨...
data:image/s3,"s3://crabby-images/8cb94/8cb94a4c18085f8a9f52648cab98ee0f4e43aebb" alt="Evil plan"
You call this module from the cli with:
$ npm run bot
Cli Arguments
You may wish to add cli arguments to direct the code in specific directions:
$ npm run bot -- --command=<CommandName>
Or if you prefer to shortcut your cli further you can add to your package.json
scripts:
{
"scripts": {
"bot:moon-prism-power": "npm run bot -- --command=moon-prism-power"
}
}
$ npm run bot:moon-prism-power β¨β¨β¨β¨
β Business Logic
app/modules/<name>.ts
Your bot logic should be defined in clear logical scopes within the src/modules
folder. It's best to keep things neat and abstracted from the start to avoid huge, confusing, single-file blobs as your bot grows.
It might seem like overkill to abstract logic out at the start (which may be true for very simple bots), but you'll notice very quickly how bloated a modestly complete bot can get.
π¨βπ¬ Detection Tests
core/tests/<name>.ts
A large part of building your bot is rapidly testing it against known detection code.
Long-term, you'll want to develop your own internal tests by de-obfuscating the vendor code of your target, however for rapid early development, using hosted ones is fine.
You can use the existing detection tests provided, or build your own using the basic template provided.
Example
export const PixelScan: PageLogic = async (page: Page): Promise<Record<string, any>> => {
// Load the test page.
await page.goto("https://pixelscan.net", { waitUntil: "networkidle2" })
await page.waitForTimeout(1500)
// Extract the result element text.
const element = await page.$("#consistency h1")
if (!element) {
throw new ElementNotFoundError(`Heading Tag`, element)
}
const result = (
await page.evaluate((element) => element.textContent, element)
).replace(/\s/g, " ").trim()
// Notify and return result.
return { result: result }
}
π§ If you add new tests remember to add them to the
index.ts
index to allow you to import all tests together if needed, and mainrun.ts
file to allow cli access.
data:image/s3,"s3://crabby-images/d8030/d8030e59388210fd673202cf7cf120a15dd52f3f" alt="Very sneaky, sir."
Running Detection Tests
To run your tests, use the command:
$ npm run tests -- --page=sannysoft
Available Tests
-
DataDome
npm run tests -- --page=datadome
-
FingerprintJS Pro
npm run tests -- --page=fingerprintjs
-
AreYouHeadless
npm run tests -- --page=headless
-
PixelScan
npm run tests -- --page=pixelscan
-
ReCAPTCHA
npm run tests -- --page=recaptcha
-
SannySoft
npm run tests -- --page=sannysoft
π§° Utils
core/utils.ts
Aim to keep all your small, highly re-used utility functions in a single place.
- rand(min: number, max: number, precision?: number) Returns a random number from a range.
- delay(min: number, max: number) Shortcuts the rand method to return a options-ready object.
- whitespace(value: string) Strips all duplicate whitespace and trims the string.
π₯ Browsers
core/browsers/<browser>.ts
Regular Browsers
All regular browsers are auto-loaded with the Stealth plugin.
- Chrome Using executable path. https://www.google.com/intl/en_au/chrome/
- Brave Using executable path. https://brave.com/
- Edge Using executable path. (Not available on Linux hosts) https://www.microsoft.com/en-us/edge
Fancy Browsers
- Browserless https://docs.browserless.io/
- MultiLogin http://docs.multilogin.com/l/en/article/tkhr0ky2s6-puppeteer-browser-automation
- Incognition https://incogniton.com/knowledge%20center/selenium-browser-automation
data:image/s3,"s3://crabby-images/28ed8/28ed8d3a8d5a5c70779b72384f2b9e30f466ea06" alt="Surfin' the web"
Examples
Chrome
// Using Chrome via the executable.
import Chrome from "../core/browsers"
const browser: Browser = await Chrome()
const page: Page = await browser.newPage()
MultiLogin
// Using MultiLogin with a profile id.
import MultiLogin from "../core/browsers"
const browser: Browser = await MultiLogin({ profileId: "fa3347ae-da62-4013-bcca-ef30825c9311"})
const page: Page = await browser.newPage()
Browserless
// Using Browserless with an api token.
import Browserless from "../core/browsers"
const browser: Browser = await Browserless(env.BROWSERLESS_TOKEN)
const page: Page = await browser.newPage()
πΎ Storage
storage/profiles/<uuid>
Local storage folder for switching Chrome profiles.
π¦ Database
core/services/db.ts
prisma/schema.prisma
Uses the fantastic Prisma database abstraction library with a simple sqlite
database, but this can easily be configured for any local or remote RDBS or keystore database.
https://www.prisma.io
Commands
$ npm run db:init # Wipes the database and regenerates types and migrations
$ npm run db:migrate # Creates migrations
$ npm run db:migrate:refresh # Long version of init
$ npm run db:generate # Generates fresh prisma files
Example
import { db } from "../core/services"
;(async () => {
// Bot execution code...
// If a result was returned, store it in the database.
if (result) {
db.scrape.create({
data: {
url: "https://www.startpage.com/en/privacy-policy/",
html: result,
},
})
}
})()
Additionally, you can build out shortcut methods in the database
folder to DRY out common database transactions.
/**
* Basic Prisma abstraction for a common task.
*
* @param {string} url
* @param {string} data
* @return {Promise<void>}
*/
export const storeScrape = async (
url: string,
data: string | Record<string, any>
): Promise<void> => {
// Flatten any objects passed in.
if (typeof data !== "string") {
data = JSON.stringify(data)
}
// Store the data.
db.scrape.create({
data: {
url: url,
data: data,
},
})
}
π Logging
core/services/logger.ts
Uses Winston to handle logging and output. Can but configured to transport to console, file, or third-party transport like Google Cloud Logging
(provided).
Check the docs here to extend or configure transports / switch out completely.
- Winston https://github.com/winstonjs/winston
- Google Cloud Logging https://cloud.google.com/logging/docs
- Bugsnag https://docs.bugsnag.com/platforms/javascript/
Google Cloud Logging configuration
To setup Google Cloud Logging
, you'll need a service account with Logs Writer
and Monitoring Metric Writer
permissions.
Guide:
- Create a GCP project https://console.cloud.google.com
- Enable the Cloud Logging API
-
Create a service account
- required roles:
- Logging > Logs Writer
- Monitoring > Monitoring Metric Writer
- required roles:
- Add a JSON key to the service account and download it to
resources/google
- Make sure to edit the
.env
to match your service account key's filename ! (GOOGLE_LOGGING_KEYFILE
property)
Tooling
The project comes preconfigured with the following tooling to keep your code neat and readable. Make sure to configure your IDE to pick up the configs.
-
Prettier
- https://prettier.io
- https://github.com/prettier/prettier
- https://marketplace.visualstudio.com/items?itemName=esbenp.prettier-vscode
-
ESLint
- https://eslint.org
- https://github.com/eslint/eslint
- https://marketplace.visualstudio.com/items?itemName=dbaeumer.vscode-eslint
- https://www.digitalocean.com/community/tutorials/linting-and-formatting-with-eslint-in-vs-code
Work In Progress
π€·ββοΈAny contributions on this would be much appreciated!
- [ ] Writing
Mocha
tests - [ ] More demos!
- [ ] Define other database systems eg
Firebase
- [ ] Containerize with
Docker
- [ ] Write mouse movement recorder and database storage driver
- [ ] Add
ghost-cursor
to demo - [ ] Apply optional world isolation
- [ ] Add emojis to logger
- [ ] Migrate css selectors to xpath