QA Crawl Support
Initial support for QA crawl!
Can be deployed with webrecorder/browsertrix-crawler qa entrypoint.
Requires --qaSource, pointing to WACZ or multi-WACZ json that will be QAd.
Also supports --qaRedisKey where QA comparison data will be pushed, if specified.
Supports --qaDebugImageDiff for outputting crawl / replay/ diff images.
The data pushed to redis is {"url": <page url>", "comparison": <...>"} where comparison is:
comparison: {
screenshotMatch?: number;
textMatch?: number;
resourceCounts: {
crawlGood?: number;
crawlBad?: number;
replayGood?: number;
replayBad?: number;
};
};
Could we also add the page id to the data pushed to Redis, just to help with matching in Browsertrix?
Could we also add the page id to the data pushed to Redis, just to help with matching in Browsertrix?
The QA data is now merged with the page data, so should already be in one place.