datasette-scraper icon indicating copy to clipboard operation
datasette-scraper copied to clipboard

Add website scraping abilities to Datasette

Results 8 datasette-scraper issues
Sort by recently updated
recently updated
newest added

I installed `datasette-scraper`, updated `metadata.json` and started datasette as root. When I try to view dss_crawl and a few others I see errors like ```Traceback (most recent call last): File...

Could datasette-scraper be made to handle pages that require javascript interaction? (e.g. infinite scroll, clicking elements to load other parts of the page). I've tested using playwright for this with...

discover_urls is meant to return an iterable of URLs; if a user has returned a string they're probably doing something wrong -- complain. (Otherwise, we'll enqueue each character in the...

You can access a store's product catalogue at `{url}/products.json?limit=250&page={page}`, it returns empty array when no more products. You can access a given product's data at `{url}/products/product-handle.json` ```jsonc // Extract Shopify...

This will require schema changes plus something cron-esque.

TODO: name and schema is up in the air. ```jsonc // TODO: determine schema for this "extract-selectors": { // TODO: flag to indicate whether we should mark up the source...

```jsonc // Extract information like title, metadesc, author, publish date, // preview image. "extract-seo": { // optional; absent implies .* "url-regex": ".*", // optional "database": "dbname", // optional; defaults to...

I think if user has [datasette-dashboards](https://github.com/rclement/datasette-dashboards) installed and we implement an appropriate [get_metadata](https://docs.datasette.io/en/stable/plugin_hooks.html#get-metadata-datasette-key-database-table), we could render a dashboard via that mechanism. Might not make sense to depend on that plugin.