browsertrix-crawler QA Crawl Support

Initial support for QA crawl! Can be deployed with webrecorder/browsertrix-crawler qa entrypoint.

Requires --qaSource, pointing to WACZ or multi-WACZ json that will be QAd.

Also supports --qaRedisKey where QA comparison data will be pushed, if specified. Supports --qaDebugImageDiff for outputting crawl / replay/ diff images.

The data pushed to redis is {"url": <page url>", "comparison": <...>"} where comparison is:

  comparison: {
    screenshotMatch?: number;
    textMatch?: number;
    resourceCounts: {
      crawlGood?: number;
      crawlBad?: number;
      replayGood?: number;
      replayBad?: number;
    };
  };

Feb 20 '24 17:02 ikreymer

Could we also add the page id to the data pushed to Redis, just to help with matching in Browsertrix?

Feb 21 '24 21:02 tw4l

Could we also add the page id to the data pushed to Redis, just to help with matching in Browsertrix?

The QA data is now merged with the page data, so should already be in one place.

Mar 20 '24 19:03 ikreymer