extension
extension copied to clipboard
web scraping extension
data:image/s3,"s3://crabby-images/371f3/371f3e4cf055c722bc9bfcd3d9e7f281745553d6" alt=""
Browser Extension
get-set, Fetch! is a browser extension for scraping sites through out a series of parametrizable scraping scenarios.
Currently supported browsers: Chrome, Firefox, Edge.
The most common use cases are handled by builtin scenarios:
-
Scrape Static Content
- Extracts text and binary content from static html pages based on CSS selectors.
-
Scrape Dynamic Content
- Extracts text and binary content from dynamic (javascript) pages based on CSS selectors.
You can also install community based scenarios:
-
Extract Html Headings - v0.2.0
- "Hello World" example of writing a scrape scenario.
-
Extract Article Content - v0.2.0
- Extract article content using Mozilla Readability library.
If you wrote a scraping scenario and want to share it, please update the above list and make a pull request.
The extension is structured as a monorepo with the following sub-packages:
- commons: mostly typescript definitions
- background: parses pages and stores relevant data in the builtin browser database (IndexedDB)
- popup: toolbar appearance
- admin: front-end for the background capabilities
- scrape-static-content: builtin scenario
- scrape-dynamic-content: builtin scenario
- extension: builds the extension files and runs a comprehensive suite of integration tests
You can find technical tidbits in each sub-package readme file.
A detailed documentation with a series of examples is available at getsetfetch.org.