Benjamin Estes issues

Results 10 issues of


                                            Benjamin Estes

Updated robots.txt to reflect new Cactus code

Updated robots.txt in the CactusBlog example to reflect Cactus update: https://github.com/koenbok/Cactus/commit/8aef21732bfc9aaa338306c540422e7141540cdc

Add example config generation command

Analogous to `crawl schema`.

enhancement

Sitemap crawling should respect robots.txt

If for some reason a site blocks its own sitemap with a robots.txt file, the crawler should respect that and not request the sitemaps in sitemap mode.

bug

SQL for hreflang report should ensure hreflang attribute has a value

Update README with clearer installation instructions

(see subject)

Handle Config errors with great messages

The Config file is the most error-prone part of the process from the user's perspective. However, we can't really get around this — there are just a lot of choices...

enhancement

Call out and expand example analysis files

enhancement

Implement capture (custom scraping)

enhancement

Error writing file after data collected

From GK's input file: ``` Traceback (most recent call last): File "/Users/benjamin/.virtualenvs/test5/bin/pyscape", line 5, in pkg_resources.run_script('pyscape-client==2015.02b2', 'pyscape') File "/Users/benjamin/.virtualenvs/test5/lib/python3.4/site-packages/pkg_resources.py", line 534, in run_script self.require(requires)[0].run_script(script_name, ns) File "/Users/benjamin/.virtualenvs/test5/lib/python3.4/site-packages/pkg_resources.py", line 1441, in...

bug

Connection refused error, randomly.

While getting numerous URLs using the CLI: ``` Traceback (most recent call last): File "/Users/benjamin/.virtualenvs/test5/lib/python3.4/site-packages/requests-2.5.3-py3.4.egg/requests/packages/urllib3/connectionpool.py", line 372, in _make_request httplib_response = conn.getresponse(buffering=True) TypeError: getresponse() got an unexpected keyword argument 'buffering'...

enhancement