extension
extension copied to clipboard

Published 20 hours ago •

→

Metadata

web scraping extension

Readme
Issues

Browser Extension

get-set, Fetch! is a browser extension for scraping sites through out a series of parametrizable scraping scenarios.

Currently supported browsers: Chrome, Firefox, Edge.

The most common use cases are handled by builtin scenarios:

Scrape Static Content
- Extracts text and binary content from static html pages based on CSS selectors.
Scrape Dynamic Content
- Extracts text and binary content from dynamic (javascript) pages based on CSS selectors.

You can also install community based scenarios:

Extract Html Headings - v0.2.0
- "Hello World" example of writing a scrape scenario.
Extract Article Content - v0.2.0
- Extract article content using Mozilla Readability library.

If you wrote a scraping scenario and want to share it, please update the above list and make a pull request.

The extension is structured as a monorepo with the following sub-packages:

commons: mostly typescript definitions
background: parses pages and stores relevant data in the builtin browser database (IndexedDB)
popup: toolbar appearance
admin: front-end for the background capabilities
scrape-static-content: builtin scenario
scrape-dynamic-content: builtin scenario
extension: builds the extension files and runs a comprehensive suite of integration tests

You can find technical tidbits in each sub-package readme file.

A detailed documentation with a series of examples is available at getsetfetch.org.

About

web scraping extension

javascript

crawler

browser

npm

scraper

extension

indexeddb

62

Stars

7

Forks

Watchers

Owner

← Metadata

62

Stars

7

Forks

Watchers

Owner

Metadata

web scraping extension