foundation
foundation copied to clipboard
π§± A uniform template to use as a foundation for Puppeteer bot construction.
π§± Foundation - Puppeteer Bot Starter Kit
Update:
Currently working on https://masqueradejs.com to replace this project as it is quite a bit out of date now, but in the mean time you can check out https://github.com/clouedoc/puppeteer-boiler which is similar and actively updated. πΎ
What it is?
Foundation is intended as a simple entry-point / template for developers new to designing Puppeteer bots.
It uses the (in)famous Puppeteer-Extra package as the primary Puppeteer driver to enable its library of Stealth plugins and evasions.
π PS: If you're working on botting and looking for a great developer community, check out the
Puppeteer-ExtraDiscord server: https://discord.gg/vz7PeKk
Foundation tries to avoid wrapping existing libraries and does not "add" much that doesn't already exist, but starting a new project with an unfamiliar library can come with a lot of questions around project structure and tooling.
This attempts to solve these issues with a ready-to-go scaffolding, however it should be noted that the structure is just, like, my opinion man... and considered under heavy flux.
However, breaking changes shouldn't matter, because its only intended as a starting point and you should take it in whatever direction makes sense.
"Ok, but I've come from Selenium / Python?"
If you're new to both modern JavaScript (ES6 & TypeScript) and Puppeteer, here's a quick rundown:
π Newbie Guide To Scraping With Puppeteer
Installation
β Note for Windows users: This project does not include
cross-env, so using WSL and Terminal Preview are essentially a requirement.
π¬ Download and init
Automatic
$ git clone https://github.com/prescience-data/foundation.git && cd ./foundation # Clone the project
$ npm run init
Manual
The automatic version runs the following commands:
$ git clone https://github.com/prescience-data/foundation.git && cd ./foundation # Clone the project
$ npm run update # Updates the package.json file dependencies to latest versions
$ npm install --loglevel=error # Installs dependencies
$ npm run db:init # Initialises a sqlite database
$ npm run build:clean # Build the TypeScript code
π¨βπ§ Configure
Edit the .env to your liking and add any services like Google Cloud Logging etc.
β Remember to
.gitignoreandgit -rm -rfyour.envfile before committing to any public repositories.
β· Build / Run
The project is TypeScript so there are a few commands provided for this.
$ npm run build:clean # Just build the TypeScript files
or...
$ npm run bot # Builds the app and runs your entrypoint file
Project Structure
The project is split into two distinct parts, core and app.
This allows you to develop a quasi-framework that you can re-use between projects in the Core concern, while keeping all project-specific code within the App concern.
π Config
core/config.ts
.env
The project uses a .env in the root to define most of the common environment variables, but you can call these from a database etc if you prefer.
The main Puppeteer LaunchOptions are defined in the config.ts file.
π€ Bot
app/bot.ts
Main self-executing function entry-point.
This is where you execute each part of your scoped logic from the modules section cleanly.
Make some magic happen π§β¨...
You call this module from the cli with:
$ npm run bot
Cli Arguments
You may wish to add cli arguments to direct the code in specific directions:
$ npm run bot -- --command=<CommandName>
Or if you prefer to shortcut your cli further you can add to your package.json scripts:
{
"scripts": {
"bot:moon-prism-power": "npm run bot -- --command=moon-prism-power"
}
}
$ npm run bot:moon-prism-power β¨β¨β¨β¨
β Business Logic
app/modules/<name>.ts
Your bot logic should be defined in clear logical scopes within the src/modules folder. It's best to keep things neat and abstracted from the start to avoid huge, confusing, single-file blobs as your bot grows.
It might seem like overkill to abstract logic out at the start (which may be true for very simple bots), but you'll notice very quickly how bloated a modestly complete bot can get.
π¨βπ¬ Detection Tests
core/tests/<name>.ts
A large part of building your bot is rapidly testing it against known detection code.
Long-term, you'll want to develop your own internal tests by de-obfuscating the vendor code of your target, however for rapid early development, using hosted ones is fine.
You can use the existing detection tests provided, or build your own using the basic template provided.
Example
export const PixelScan: PageLogic = async (page: Page): Promise<Record<string, any>> => {
// Load the test page.
await page.goto("https://pixelscan.net", { waitUntil: "networkidle2" })
await page.waitForTimeout(1500)
// Extract the result element text.
const element = await page.$("#consistency h1")
if (!element) {
throw new ElementNotFoundError(`Heading Tag`, element)
}
const result = (
await page.evaluate((element) => element.textContent, element)
).replace(/\s/g, " ").trim()
// Notify and return result.
return { result: result }
}
π§ If you add new tests remember to add them to the
index.tsindex to allow you to import all tests together if needed, and mainrun.tsfile to allow cli access.
Running Detection Tests
To run your tests, use the command:
$ npm run tests -- --page=sannysoft
Available Tests
- DataDome
npm run tests -- --page=datadome - FingerprintJS Pro
npm run tests -- --page=fingerprintjs - AreYouHeadless
npm run tests -- --page=headless - PixelScan
npm run tests -- --page=pixelscan - ReCAPTCHA
npm run tests -- --page=recaptcha - SannySoft
npm run tests -- --page=sannysoft
π§° Utils
core/utils.ts
Aim to keep all your small, highly re-used utility functions in a single place.
- rand(min: number, max: number, precision?: number) Returns a random number from a range.
- delay(min: number, max: number) Shortcuts the rand method to return a options-ready object.
- whitespace(value: string) Strips all duplicate whitespace and trims the string.
π₯ Browsers
core/browsers/<browser>.ts
Regular Browsers
All regular browsers are auto-loaded with the Stealth plugin.
- Chrome Using executable path. https://www.google.com/intl/en_au/chrome/
- Brave Using executable path. https://brave.com/
- Edge Using executable path. (Not available on Linux hosts) https://www.microsoft.com/en-us/edge
Fancy Browsers
- Browserless https://docs.browserless.io/
- MultiLogin http://docs.multilogin.com/l/en/article/tkhr0ky2s6-puppeteer-browser-automation
- Incognition https://incogniton.com/knowledge%20center/selenium-browser-automation
Examples
Chrome
// Using Chrome via the executable.
import Chrome from "../core/browsers"
const browser: Browser = await Chrome()
const page: Page = await browser.newPage()
MultiLogin
// Using MultiLogin with a profile id.
import MultiLogin from "../core/browsers"
const browser: Browser = await MultiLogin({ profileId: "fa3347ae-da62-4013-bcca-ef30825c9311"})
const page: Page = await browser.newPage()
Browserless
// Using Browserless with an api token.
import Browserless from "../core/browsers"
const browser: Browser = await Browserless(env.BROWSERLESS_TOKEN)
const page: Page = await browser.newPage()
πΎ Storage
storage/profiles/<uuid>
Local storage folder for switching Chrome profiles.
π¦ Database
core/services/db.ts
prisma/schema.prisma
Uses the fantastic Prisma database abstraction library with a simple sqlite database, but this can easily be configured for any local or remote RDBS or keystore database.
https://www.prisma.io
Commands
$ npm run db:init # Wipes the database and regenerates types and migrations
$ npm run db:migrate # Creates migrations
$ npm run db:migrate:refresh # Long version of init
$ npm run db:generate # Generates fresh prisma files
Example
import { db } from "../core/services"
;(async () => {
// Bot execution code...
// If a result was returned, store it in the database.
if (result) {
db.scrape.create({
data: {
url: "https://www.startpage.com/en/privacy-policy/",
html: result,
},
})
}
})()
Additionally, you can build out shortcut methods in the database folder to DRY out common database transactions.
/**
* Basic Prisma abstraction for a common task.
*
* @param {string} url
* @param {string} data
* @return {Promise<void>}
*/
export const storeScrape = async (
url: string,
data: string | Record<string, any>
): Promise<void> => {
// Flatten any objects passed in.
if (typeof data !== "string") {
data = JSON.stringify(data)
}
// Store the data.
db.scrape.create({
data: {
url: url,
data: data,
},
})
}
π Logging
core/services/logger.ts
Uses Winston to handle logging and output. Can but configured to transport to console, file, or third-party transport like Google Cloud Logging (provided).
Check the docs here to extend or configure transports / switch out completely.
- Winston https://github.com/winstonjs/winston
- Google Cloud Logging https://cloud.google.com/logging/docs
- Bugsnag https://docs.bugsnag.com/platforms/javascript/
Google Cloud Logging configuration
To setup Google Cloud Logging, you'll need a service account with Logs Writer and Monitoring Metric Writer permissions.
Guide:
- Create a GCP project https://console.cloud.google.com
- Enable the Cloud Logging API
- Create a service account
- required roles:
- Logging > Logs Writer
- Monitoring > Monitoring Metric Writer
- required roles:
- Add a JSON key to the service account and download it to
resources/google - Make sure to edit the
.envto match your service account key's filename ! (GOOGLE_LOGGING_KEYFILEproperty)
Tooling
The project comes preconfigured with the following tooling to keep your code neat and readable. Make sure to configure your IDE to pick up the configs.
-
Prettier
- https://prettier.io
- https://github.com/prettier/prettier
- https://marketplace.visualstudio.com/items?itemName=esbenp.prettier-vscode
-
ESLint
- https://eslint.org
- https://github.com/eslint/eslint
- https://marketplace.visualstudio.com/items?itemName=dbaeumer.vscode-eslint
- https://www.digitalocean.com/community/tutorials/linting-and-formatting-with-eslint-in-vs-code
Work In Progress
π€·ββοΈAny contributions on this would be much appreciated!
- [ ] Writing
Mochatests - [ ] More demos!
- [ ] Define other database systems eg
Firebase - [ ] Containerize with
Docker - [ ] Write mouse movement recorder and database storage driver
- [ ] Add
ghost-cursorto demo - [ ] Apply optional world isolation
- [ ] Add emojis to logger
- [ ] Migrate css selectors to xpath