chromote icon indicating copy to clipboard operation
chromote copied to clipboard

Create a `ChromoteSession$nagivate()` method?

Open gadenbuie opened this issue 1 year ago • 6 comments
trafficstars

As described in the README under Loading a page reliably, one cannot use the following pattern to reliably wait for the page to load.

# Not reliable
b$Page$navigate("https://www.r-project.org/")
b$Page$loadEventFired()  # Block until page has loaded
b$screenshot("browser.png")

Instead, we suggest an synchronous or asynchronous approach that involves correctly threading the $Page$loadEventFired() and $Page$navigate() commands.

In practice, however, most people just sys.sleep() for some reasonable or acceptable amount of time (#176).

Considering that we have a few convenience methods already, e.g. ChromoteSession$screenshot(), I'd like to propose that we offer a similar convenience method, ChromoteSession$navigate(), that automatically waits for the page load event and awaits the value (wait_ = TRUE) or returns the promise (wait_ = FALSE).

In the base case, that would simplify the above example code to just these two lines:

b$navigate("https://www.r-project.org/")
b$screenshot("browser.png")

gadenbuie avatar Sep 12 '24 01:09 gadenbuie

I think this would be very helpful! FWIW, if the page is making API calls, $Page$loadEventFired() might not be enough for the page to really be "done". I don't have it fully loaded in my memory right now, but I remember that being the source of a lot of the complexity in {apisniffer}. I never felt like what I was doing was absolutely correct, but I was using callbacks for session$Network$requestWillBeSent and session$Network$responseReceived to make sure all of the calls that fired off had received a response (or a maximum wait time was reached) before considering the page fully loaded. That might be beyond the scope of this ticket, but I wanted to bring it up (because if you CAN handle that piece, too, that would be very cool 😊 )!

jonthegeek avatar Sep 12 '24 11:09 jonthegeek

Great point @jonthegeek. I like what pupeteer does in taking a predefined lifecycle event. In R would be a wait_until argument that takes one of the following values

wait_until Event
"load" Waits for the load event, i.e. b$Page$loadEventFired().
"DOMContentLoaded" Waits for the DOMContentLoaded event.
"networkIdle0" Waits until there are no network connections for at least 500ms.
"networkIdle2" Waits until there are no more than 2 network connections for at least 500ms.

I also really like the page.waitFor*() methods (listed here) and would love to see some of those make it into chromote. In particular

  • waitForSelector() – waits for a CSS selector to have matching elements
  • waitForFunction() – waits for a function to return a truthy value
  • waitForNetworkIdle() – see above
  • waitForNavigation() – wait for the page to navigate to a new URL
  • waitForRequest() – wait for a request to a URL (or request identified by a predicate function)
  • waitForResponse() – wait for a response from a URL (or a response identified by a predicate function)

gadenbuie avatar Sep 12 '24 12:09 gadenbuie

Oooh, I like waitForNetworkIdle()! I think for my specific use case I also want to wait for responses to sent events (with a timeout), but I think mixing in network idle would help deal with part of my "but is that everything?" problem.

jonthegeek avatar Sep 12 '24 13:09 jonthegeek

Oh, ha, I just saw waitForResponse() is in there, too!

jonthegeek avatar Sep 12 '24 13:09 jonthegeek

Good to know! I hadn't included waitForResponse() in my smaller list because I wasn't sure if that would be immediately useful to chromote users. Glad to get that feedback right away. (I'll edit the list above.) I do think waitForRequest() and waitForResponse() are a package deal.

gadenbuie avatar Sep 12 '24 13:09 gadenbuie

Yeah, definitely! I'd have to understand how it works a little better; what I do right now is log responses until a certain time has passed (simulating waitForNetworkIdle(), kinda), and then check that there are corresponding response events for each of those requests. I think what I'd want to do is fire off a waitForResponse() (with a timeout) as a callback on session$Network$requestWillBeSent(), disabling that callback with waitForNetworkIdle(). That assumes waitForResponse() ties (or at least can tie) to a particular request.

I very much doubt that my use case will fall neatly onto the standard path, but having the helpers would definitely make my code feel less hacky!

jonthegeek avatar Sep 12 '24 13:09 jonthegeek