Search-Engines-Scraper icon indicating copy to clipboard operation
Search-Engines-Scraper copied to clipboard

How does the filter argument work?

Open minthemiddle opened this issue 5 years ago • 1 comments

How can I filter to exclude two hosts (wikipedia.org and facebook.com)?

According to the docs, filtering is done via -f argument. '-f', filter results [url, title, text, host] is what I find in the script.

As -o json will output to JSON and is described as '-o', help='output file [html, csv, json]', I expected something along the lines of -f host REGEX but does not work.

minthemiddle avatar Apr 04 '20 14:04 minthemiddle

The -f argument is somewhat similar to the advanced search operators of Google. The difference is that it doesn't accept a value, the value is the search query. Also, the filter is inclusive and it doesn't accept regular expressions. For example, if the search query is "query" and the filter is "url", only links that contain "query" in the URL will be collected - it would be equivalent to Google's advanced search operator "allinurl: query". If you think this feature can be improved, you're very welcome to contribute, or I may do it myself when I have some free time.

tasos-py avatar Apr 04 '20 18:04 tasos-py