dude icon indicating copy to clipboard operation
dude copied to clipboard

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

Results 43 dude issues
Sort by recently updated
recently updated
newest added

Output is always flattened into a single list of dictionaries before saving to CSV, JSON, etc. By grouping data into separate tables, it will be easier to post-process and merge...

enhancement

It will be easier to extract individual table cell data if `colspan` and `rowspan` are exploded into single cells. Add options (`--explode-rowspan`, `--explode-colspan` and/or `--explode-table-cells`) to use. For example, this...

enhancement

A good reference: https://github.com/TeamHG-Memex/MaybeDont/blob/master/maybedont/predict.py

enhancement

There are existing ways to extract data from JSON without traversing the contents one by one. ## Options - [JsonPath](https://goessner.net/articles/JsonPath/) - [JMESPath](https://jmespath.org/) ## Proposed style ```python @select(jsonpath="$.store.book[0].title") def extract_title(title): return...

enhancement
help wanted
good first issue

An option to run only or skip specific handler functions ```bash dude scrape ... --run-only dude scrape ... --skip ``` Ideas: - By parameter to `@select()`. - By function name

enhancement
help wanted

https://curl.se/docs/manpage.html#URL

enhancement

Selenium proxy is not yet implemented https://github.com/roniemartinez/dude/blob/169181386063a90d83d3b0b985f92a2c47a1d28c/dude/optional/selenium_scraper.py#L142 https://github.com/roniemartinez/dude/blob/169181386063a90d83d3b0b985f92a2c47a1d28c/dude/optional/selenium_scraper.py#L184

enhancement

https://github.com/AtuboDad/playwright_stealth

enhancement