browsertrix-crawler icon indicating copy to clipboard operation
browsertrix-crawler copied to clipboard

Skipping autoscroll when page should be able to scroll

Open edsu opened this issue 2 years ago • 0 comments

When crawling https://library.stanford.edu/blogs/special-collections-unbound/2022/11/born-digital-collections-opened-research-2022 with browsertrix-crawler v0.10.0 I noticed that some resources (e.g. https://purl.stanford.edu/pq546tq4448/iiif/manifest) were not being captured because the page was not being autoscrolled.

Here is the configuration I used to capture this one page, where I explicitly turned on autoscroll even though it should be on by default when no behaviors are explicitly used:

collection: sul-embed
screencastPort: 9037
generateWACZ: true
behaviors:
  - autoscroll
  - siteSpecific
  - autoplay
  - autofetch
seeds:
  - url: https://library.stanford.edu/blogs/special-collections-unbound/2022/11/born-digital-collections-opened-research-2022
    scopeType: page

I noticed in the log messages that it appears that autoscrolling is being skipped for some reason?

{"logLevel":"info","timestamp":"2023-05-24T21:54:50.589Z","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"Skipping autoscroll, page seems to not be responsive to scrolling events","page":"https://library.stanford.edu/blogs/special-collections-unbound/2022/11/born-digital-collections-opened-research-2022","workerid":0}}
{"logLevel":"info","timestamp":"2023-05-24T21:54:50.589Z","context":"behaviorScript","message":"Behavior log","details":{"state":{"segments":1},"msg":"done!","page":"https://library.stanford.edu/blogs/special-collections-unbound/2022/11/born-digital-collections-opened-research-2022","workerid":0}}

This appears to be a bug in how the page is detecting whether the page can scroll? it is clearly possible to scroll the page.

edsu avatar May 24 '23 22:05 edsu