browsertrix-old icon indicating copy to clipboard operation
browsertrix-old copied to clipboard

Screenshots fail to save

Open jswrenn opened this issue 5 years ago • 3 comments

The collections view, showing most screenshot fetching attempts failing with 404 responses: collection view

For that run, num_tabs: 20. Here's a snippet of the crawl log:

<2020-06-12 19:38:12 INFO> CrawlerTab[navigation_reset]: Resetting tab to about:blank
<2020-06-12 19:38:12 ERROR> CrawlerTab[capture_and_upload_screenshot]: capturing a screenshot of the page failed
Traceback (most recent call last):
  File "/app/autobrowser/tabs/basetab.py", line 385, in capture_and_upload_screenshot
    screen_shot = await self.capture_screenshot()
    │                   └ CrawlerTab(url=https://orgsync.com/89668/budget_admin/958123/review, running=True connected=True, graceful_shutdown=False, tab_i...
    └ None
  File "/app/autobrowser/tabs/basetab.py", line 372, in capture_screenshot
    format="png",
concurrent.futures._base.CancelledError

<2020-06-12 19:38:12 ERROR> CrawlerTab[capture_and_upload_screenshot]: capturing a screenshot of the page failed
Traceback (most recent call last):
  File "/app/autobrowser/tabs/basetab.py", line 385, in capture_and_upload_screenshot
    screen_shot = await self.capture_screenshot()
    │                   └ CrawlerTab(url=https://orgsync.com/89668/budget_admin/958120/review, running=True connected=True, graceful_shutdown=False, tab_i...
    └ None
  File "/app/autobrowser/tabs/basetab.py", line 372, in capture_screenshot
    format="png",
concurrent.futures._base.CancelledError

This mostly seems to not happen when num_tabs: 1.

jswrenn avatar Jun 12 '20 20:06 jswrenn

This problem doesn't occur when num_browsers: 20 (and num_tabs: 1). However, if I do this, the crawl appears to work alright, but remains stuck in the new state.

jswrenn avatar Jun 12 '20 21:06 jswrenn

@ikreymer Is it possible to disable screenshot collection altogether?

jswrenn avatar Jun 13 '20 17:06 jswrenn

Hey, apologies for late response! Yes, num_tabs of 20 is a lot! In fact, the screenshot feature may not work at all with multiple tabs, I've generally tried it with one tab only.

For disable tabs, unfortunately due to a bug, setting to an empty collection ends up using the default, but you can set:

screenshot_coll: ' '
text_coll: ' '

It's a bit awkward, but i think this workaround should work until there's a better fix. The text coll is for the text extraction, if you also want to disable that.

ikreymer avatar Jun 18 '20 23:06 ikreymer