docsearch-scraper icon indicating copy to clipboard operation
docsearch-scraper copied to clipboard

DocSearch - Scraper

Results 28 docsearch-scraper issues
Sort by recently updated
recently updated
newest added

A user expected the crawler to respect the `` meta tag that should tell crawlers to skip a page. We don't honor this tag at all (nor do we honor...

enhancement

I have personally experienced `Ctrl-C` resulting in an incomplete index. The scrappy documentation for `spider_closed` signal, https://docs.scrapy.org/en/latest/topics/signals.html#scrapy.signals.spider_closed , mentions that the reason for the closing should be `finished` under normal...

Relates to #459 With this PR, I'm trying to initiate some move/improvements with the Docker image structure. The image uses a lot of layers for no obvious reasons. Let's try...

Pinning `google-chrome-stable` is not the easiest as versions are removed from time to time as the newer versions usually become the stable ones. I've seen efforts in bumping the Chrome...

Allows chrome webdriver to authenticate using an auth cookie pulled from the .env file. This would allow for scraping of password-protected documentation.

# Situation When a configuration includes `custom_settings.attributesForFaceting`, the index's setting `attributesForFaceting` does not include `tags` anymore. This override the `default_settings` defined by the strategy. `tags` defined from `start_urls` are not...

bug
help wanted
need_fix_test

The docker image that's being published for this repository is severely fragmented. As a result, it takes a much longer time to download than it should and consumes a lot...

bug
enhancement
help wanted

It would be awesome to throw a page up that just displays a list of the last index times for all the configs. I know that it would help me...

enhancement

Currently, the scraper assumes it's writing progress messages to an ANSI-compatible terminal. As a result, the progress messages look like this in a CI environment: ``` [94m> DocSearch: [0mhttps://docs.couchbase.com/server/6.0/introduction/intro.html ([93m51...

bug
enhancement
help wanted