crrri
crrri copied to clipboard
Use of Network.setRequestInterception
Hi,
If I want to check a page and get some informations about network operations, I can do something like the following :
url <- "https://www.r-project.org/"
promise_all(
chrome %>%
Page.enable() %>%
Page.navigate(url),
chrome %>%
Network.enable() %>%
Network.responseReceived() %...T>% {
print("received")
print(.$result$response$url)
}
)
However, I'd like to use Network.setRequestInterception
to be able to capture only certain requests. I tried to do it this way, but it doesn't seem to work :
promise_all(
chrome %>%
Page.enable() %>%
Page.navigate(url),
chrome %>%
Network.enable() %>%
Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
Network.requestIntercepted() %...T>% {
print("intercepted")
}
)
Would you have any idea of what I'm doing wrong ?
Thanks !
For this kind of task, I prefer to do the following script, that works:
library(crrri)
chrome <- chr_connect()
url = "https://www.rstudio.com"
intercepted <-
chrome %>%
Page.enable() %>%
Network.enable() %>%
Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
Network.requestIntercepted() %...T>% {
print("intercepted")
}
chrome %>%
Page.navigate(url)
chr_disconnect(chrome)
This first creates a promise for intercepted network request then opens the url.
If you want to execute a callback for each intercepted network request, you can use the .callback
argument
library(crrri)
chrome <- chr_connect()
url = "https://www.rstudio.com"
chrome %>%
Page.enable() %>%
Network.enable() %>%
Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
Network.requestIntercepted(.callback = function(msg) {
print(msg$params$request)
})
chrome %>%
Page.navigate(url)
chrome %>%
Page.navigate("https://cdn.rawgit.com/juba/rmdformats/master/resources/examples/material/material.html")
chr_disconnect(chrome)
Thanks !
One more question about your first example : if I run all the code at once, the request is not intercepted. I have to run the intercepted
promise, wait a little, and then run the Page.navigate
. Putting a Sys.sleep()
in between doesn't seem to work either.
And if I try with headless=FALSE
, I get an error when running the Page.navigate
promise :
[error] handle_read_frame error: websocketpp.transport:7 (End of File)
You're right, I wrote an "interactive" script. My mistake.
library(crrri)
chrome <- chr_connect()
url = "https://www.rstudio.com"
configured <-
chrome %>%
Page.enable() %>%
Network.enable() %>%
Network.setRequestInterception(patterns = list(list(urlPattern="*")))
intercepted <-
configured %>%
Network.requestIntercepted() %...T>% {
print("intercepted")
}
configured %>%
Page.navigate(url) %...!% {
}
intercepted %...>% {
chr_disconnect(chrome)
}
Using callback would be a better idea for this use case
What another great use case for the event emitter API! I'll post a new example soon hoping it will work ok!
Ok, one more thing I fear...
Your last example works fine when using promises only, but it seems I'm in trouble if I try to use callbacks :
library(crrri)
chrome <- chr_connect("google-chrome")
url = "https://www.rstudio.com"
configured <- chrome %>%
Page.enable() %>%
Network.enable() %>%
Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
Network.requestIntercepted(.callback = function(msg) {
print(msg$params$request)
})
ended <- configured %>%
Page.navigate(url) %>%
Page.loadEventFired() %>%
DOM.getDocument() %>%
DOM.getOuterHTML(nodeId = ~.$root$nodeId) %...T>%
{ print(.$result)}
ended %...>% { chr_disconnect(chrome) }
In this case I get an error : Unhandled promise error: objet de type 'closure' non indiçable
. It seems that the promise returned when using .callback
"breaks" the pipeline.
Once again, I'm sorry if this is a misunderstanding from my part.
There are 2 different topics: one related to crrri
and the other one related to the use of Chrome DevTools.
Using a callback in an event listener
As stated in the documentation, an event listener returns:
An async value of class promise. The value and the completion of the promise differ according to the use of a callback function. [...] When
.callback
is notNULL
, the promise is fulfilled as soon as the callback is created; the value is a function without any argument that can be called to cancel the callback. When you use the.callback
argument, you cannot send the result to any other command or event listener.
That means that the configured
promise cannot be used as in this example.
Using Network.setRequestInterception()
In the example, all the requests are intercepted.
When the Page.navigate
command is used, the request is intercepted and will never be sent...
So, the load
event will never fire (the request is not sent, so there will be no response...)
Here's a modified and commented version of the last example:
library(crrri)
chrome <- chr_connect()
url = "https://www.rstudio.com"
configured <-
chrome %>%
Page.enable() %>%
Network.enable() %>%
# intercept all the requests (why?):
Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
Network.requestIntercepted(.callback = function(msg) {
print(msg$params$request)
})
remove_callback <- function() {
configured %...>%
do.call(list())
}
# If you want to remove the callback, use:
# remove_callback()
ended <- # WARNING: this promise will never be resolved (see below)
configured %...>% {
chrome %>%
Page.navigate(url) %>% # send a request
Page.loadEventFired() %>% # Since all the requests are intercepted, the load event will never fire
DOM.getDocument() %>%
DOM.getOuterHTML(nodeId = ~.$root$nodeId) %...T>%
{ print(.$result)}
}
# WARNING: The ended promise will never be resolved (use a timeout!)
ended %...>% { chr_disconnect(chrome) }
Ok, thanks for the explanation. I didn't find this in the DevTools documentation, but that's what I think I mostly understood.
The way I see it, the only way for it to work would be to issue a Network.continueInterceptedRequest
after the requestIntercepted
. This work when using only promises, but by putting a callback in requestIntercepted
, I don't see how this could be possible as the promises pipeline is "broken".
Anyway I think I can achieve mostly what I'm trying to do by using responseReceived
, which seems to be non-blocking.
You can use Network.continueInterceptedRequest
like that:
library(crrri)
chrome <- chr_connect()
url = "https://www.rstudio.com"
configured <-
chrome %>%
Page.enable() %>%
Network.enable() %>%
# intercept all the requests:
Network.setRequestInterception(patterns = list(list(urlPattern="*")))
configured %>%
Network.requestIntercepted() %...>% {
print(.$result$request)
Network.continueInterceptedRequest(.$ws, interceptionId = .$result$interceptionId)
}
configured %>%
Page.navigate(url) %>% # send a request
Page.loadEventFired() %>%
DOM.getDocument() %>%
DOM.getOuterHTML(nodeId = ~ .res$root$nodeId) %...T>%
{ print(.$result)}
Yes, this works, but can you add a callback in Network.requestIntercepted
in this way ?
I think this script will be fine:
library(crrri)
chrome <- chr_connect()
url = "https://www.rstudio.com"
configured <-
chrome %>%
Page.enable() %>%
Network.enable() %>%
Network.setRequestInterception(patterns = list(list(urlPattern="*")))
configured %>%
Network.requestIntercepted(.callback = function(msg) {
print(msg$params$request)
configured %>% Network.continueInterceptedRequest(interceptionId = msg$params$interceptionId)
})
configured %>%
Page.navigate(url) %>% # send a request
Page.loadEventFired() %>%
DOM.getDocument() %>%
DOM.getOuterHTML(nodeId = ~ .res$root$nodeId) %...T>%
{ print(.$result)}
For the record, I think this works if you add a interceptionStage="HeadersReceived"
argument to setRequestInterception
, something like :
library(crrri)
chrome <- chr_connect("google-chrome", headless = TRUE)
url <- "https://rstudio.com"
configured <- chrome %>%
Page.enable() %>%
Network.enable() %>%
Network.setRequestInterception(patterns = list(list(urlPattern="*", interceptionStage="HeadersReceived")))
configured %>%
Network.requestIntercepted(.callback = function(msg) {
print(msg$params$request$url)
configured %>%
Network.continueInterceptedRequest(interceptionId = msg$params$interceptionId)
})
ended <- configured %>%
Page.navigate(url) %>%
Page.loadEventFired() %>%
DOM.getDocument() %>%
DOM.getOuterHTML(nodeId = ~ .res$root$nodeId) %...T>%
{ print("---"); print(substring(.$result, 0, 100))}
ended %...>% {print("Done"); chr_disconnect(chrome)}
This is great, many many thanks !
What another great use case for the event emitter API! I'll post a new example soon hoping it will work ok!
OK for reference, I tested the new API using this use case. https://gist.github.com/cderv/67d7ad8998559f2ce14b4eb4bb852fd1#file-request_interception-r