devtools icon indicating copy to clipboard operation
devtools copied to clipboard

Web crawler for reporting

Open photodow opened this issue 4 years ago • 0 comments

If we can build a web crawler to validate and report back we can provide some major value to the carbon teams. We've already built a web crawler, but the issue I came across was data. I think having Carbon telemetry #11 will help us in this area. Then we need a way to add new links and manage the queue of links to validate.

Assuming IBM.com has 30 million pages and we'd like to roughly check each URL once a month then we need to be able to process roughly 12 URLs per second give or take.

Things to consider...

  • [ ] Database need to upload reports #11
  • [ ] URL Queue
  • [ ] Interface to add and manage URL Queue

photodow avatar Oct 01 '20 16:10 photodow