docsearch-scraper
docsearch-scraper copied to clipboard
Output CI-friendly progress messages
Currently, the scraper assumes it's writing progress messages to an ANSI-compatible terminal. As a result, the progress messages look like this in a CI environment:
[94m> DocSearch: [0mhttps://docs.couchbase.com/server/6.0/introduction/intro.html ([93m51 records[0m)
[94m> DocSearch: [0mhttps://docs.couchbase.com/home/contribute/includes.html ([93m23 records[0m)
[94m> DocSearch: [0mhttps://docs.couchbase.com/server/6.0/n1ql/n1ql-language-reference/index.html ([93m28 records[0m)
Either add an option to output plain messages or automatically detect if ANSI color codes are not supported.
The easiest way to accomplish this might be to route the messages through a logger which can be configured separately. I'd also be interested in silencing the messages completely, which a logger would also help with.
I should note that not all CI environments have this problem. For instance, GitLab CI is capable of showing ANSI color codes. Jenkins, on the other hand, is not.
Having a proper logger is one of our objective at some point. No ETA so far, we will solve this while moving our codebase to a proper python v3/scrapy integration.
:+1:
If you need help, don't hesitate to ask. I'll be using docsearch for the foreseeable future, so I'll be around.
Thanks, send us an email [email protected], we have a small gift for you :)