Search-Engines-Scraper
Search-Engines-Scraper copied to clipboard
How does the filter argument work?
How can I filter to exclude two hosts (wikipedia.org and facebook.com)?
According to the docs, filtering is done via -f argument.
'-f', filter results [url, title, text, host] is what I find in the script.
As -o json will output to JSON and is described as '-o', help='output file [html, csv, json]', I expected something along the lines of -f host REGEX but does not work.
The -f argument is somewhat similar to the advanced search operators of Google. The difference is that it doesn't accept a value, the value is the search query. Also, the filter is inclusive and it doesn't accept regular expressions. For example, if the search query is "query" and the filter is "url", only links that contain "query" in the URL will be collected - it would be equivalent to Google's advanced search operator "allinurl: query". If you think this feature can be improved, you're very welcome to contribute, or I may do it myself when I have some free time.