reader icon indicating copy to clipboard operation
reader copied to clipboard

Feature request: x-scroll header

Open mb21 opened this issue 1 year ago • 2 comments

We already have the x-timeout header, which works for a lot of javascript-heavy websites. But some websites lazy-load certain things only when you scroll down a bit.

Therefore, I propose an x-scroll header, which would basically execute the following js after the page finished loading:

window.scrollTo({
  top: document.body.scrollHeight,
  behavior: "smooth",
})

(Pretty sure, 'smooth' scrolling triggers any IntersectionObservers in-between the top and the bottom of the page.)

And as soon as that's done and the event loop is empty, execute it again. As many times until either scrolling down doesn't expand the pages height anymore, or x-timeout is reached.

mb21 avatar Nov 04 '24 08:11 mb21

We have introduced a script injection mechanism to our API. Also inside the page, we provide these utility functions/event:

- waitForSelector(selector: string): Promise<HTMLElement> 
  waits for the selector to appear in the DOM
- simulateScroll(): void 
  simulates scrolling to the bottom of the page to trigger lazyload elements
- "mutationIdle" event on document 
  fires when the DOM mutation is idle in 200ms

See https://github.com/jina-ai/reader/issues/150 for example

nomagick avatar Nov 12 '24 06:11 nomagick

Thanks! Seems curl ... --data-urlencode 'injectPageScript=document.addEventListener("mutationIdle", window.simulateScroll);' should indeed work for this, I'll give it a try. Feel free to close this issue then.

mb21 avatar Nov 13 '24 08:11 mb21