browsertrix-crawler icon indicating copy to clipboard operation
browsertrix-crawler copied to clipboard

QA Crawl Support

Open ikreymer opened this issue 2 years ago • 2 comments

Initial support for QA crawl! Can be deployed with webrecorder/browsertrix-crawler qa entrypoint.

Requires --qaSource, pointing to WACZ or multi-WACZ json that will be QAd.

Also supports --qaRedisKey where QA comparison data will be pushed, if specified. Supports --qaDebugImageDiff for outputting crawl / replay/ diff images.

The data pushed to redis is {"url": <page url>", "comparison": <...>"} where comparison is:

  comparison: {
    screenshotMatch?: number;
    textMatch?: number;
    resourceCounts: {
      crawlGood?: number;
      crawlBad?: number;
      replayGood?: number;
      replayBad?: number;
    };
  };

ikreymer avatar Feb 20 '24 17:02 ikreymer

Could we also add the page id to the data pushed to Redis, just to help with matching in Browsertrix?

tw4l avatar Feb 21 '24 21:02 tw4l

Could we also add the page id to the data pushed to Redis, just to help with matching in Browsertrix?

The QA data is now merged with the page data, so should already be in one place.

ikreymer avatar Mar 20 '24 19:03 ikreymer