scrape
scrape copied to clipboard
How do we name the generated index the same as the domain
for instance scrape www.google.com > google.html not PART01.html
Also do you have an example of how to use the attributes command?
There is currently no option to do what you're describing, the current behavior is to generate a directory bearing the domain name which is then populated with PART.html files scraped from that domain.
I'm open to suggestions and pull requests which may alter this mechanism.
The --attributes
option is essentially a reduced version of the --xpath
option, per the README:
- If you only want to specify specific tag attributes to extract rather than an entire XPath, use --attributes. The default choice is to extract only text attributes, but you can specify one or many different attributes (such as href, src, title, or any attribute available..).
Specifying --attributes href
for example would retrieve only the contents of all the href
HTML attributes. This flag cannot be used in conjunction with storing HTML output, however. You can test it using any of the other output options (e.g. print, test, csv, pdf).