crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

Save screenshot/HTML on first occurrence of error in error statistics

Open metalwarrior665 opened this issue 1 year ago • 2 comments

Which package is the feature request for? If unsure which one to select, leave blank

None

Feature

There is already a robust system of organizing error statistics. For some scrapers, I use an "ErrorSnapshotter" approach where on the first occurrence of each type of error, a screenshot and/or HTML is stored in KV store for further analysis. We should also store a link (either Apify or local path) to the snapshot KV records next to the error statistics count.

Here is an example of how the stats file looks now https://api.apify.com/v2/key-value-stores/L7DclFFX3fHuPCne9/records/SDK_CRAWLER_STATISTICS_0

Motivation

Useful for faster default debugging, especially for "one of thousands" type of errors or when other users are running the scraper.

Ideal solution or implementation, and any additional constraints

Implementation ideas The current implementation is hidden under several function calls so it is a bit tricky to add a completely new functionality. Tha main classes are Statistics and ErrorTracker.

  1. The dirty solution would be to send the crawling context through the function calls and then just dynamically figure out if it is Puppeteer, Playwright, or HTML body and use the appropriate snapshotting method from context.
  2. The more proper way would probably be to use generics all the way down but I haven't explored that option.
  3. Or do a larger refactor

Keep in mind

  1. Some errors happen before any page is created or opened, before navigation happens, or after the page is already closed (maybe then a response object is still available to store HTML?)
  2. We need to generate unique filenames. I like if the filenames carry some information so one idea is a hash of the full error object path from ErrorTracker + the first 30 (or 50?) characters of the error for easy reading.

Alternative solutions or implementations

No response

Other context

No response

metalwarrior665 avatar Jan 11 '24 18:01 metalwarrior665

Sounds like a duplicate of #1771, maybe we should close that one?

B4nan avatar Jan 12 '24 12:01 B4nan

Haha, so I wasted some time :) Closed the old one.

metalwarrior665 avatar Jan 12 '24 14:01 metalwarrior665