crrri icon indicating copy to clipboard operation
crrri copied to clipboard

Use of Network.setRequestInterception

Open juba opened this issue 5 years ago • 13 comments

Hi,

If I want to check a page and get some informations about network operations, I can do something like the following :

url <- "https://www.r-project.org/"

promise_all(
  chrome %>%
    Page.enable() %>%
    Page.navigate(url),
  chrome %>% 
    Network.enable() %>%
    Network.responseReceived() %...T>% {
      print("received")
      print(.$result$response$url)
    }
)

However, I'd like to use Network.setRequestInterception to be able to capture only certain requests. I tried to do it this way, but it doesn't seem to work :

promise_all(
  chrome %>%
    Page.enable() %>%
    Page.navigate(url),
  chrome %>% 
    Network.enable() %>%
    Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
    Network.requestIntercepted() %...T>% {
      print("intercepted")
    }
)

Would you have any idea of what I'm doing wrong ?

Thanks !

juba avatar Feb 28 '19 15:02 juba

For this kind of task, I prefer to do the following script, that works:

library(crrri)

chrome <- chr_connect()
url = "https://www.rstudio.com"

intercepted <- 
  chrome %>%
  Page.enable() %>%
  Network.enable() %>%
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
  Network.requestIntercepted() %...T>% {
    print("intercepted")
  }

chrome %>%
  Page.navigate(url)

chr_disconnect(chrome)

This first creates a promise for intercepted network request then opens the url.

RLesur avatar Feb 28 '19 15:02 RLesur

If you want to execute a callback for each intercepted network request, you can use the .callback argument

library(crrri)

chrome <- chr_connect()
url = "https://www.rstudio.com"

chrome %>%
  Page.enable() %>%
  Network.enable() %>%
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
  Network.requestIntercepted(.callback = function(msg) {
    print(msg$params$request)
  })

chrome %>%
  Page.navigate(url)

chrome %>%
  Page.navigate("https://cdn.rawgit.com/juba/rmdformats/master/resources/examples/material/material.html")

chr_disconnect(chrome)

RLesur avatar Feb 28 '19 15:02 RLesur

Thanks !

One more question about your first example : if I run all the code at once, the request is not intercepted. I have to run the intercepted promise, wait a little, and then run the Page.navigate. Putting a Sys.sleep() in between doesn't seem to work either.

And if I try with headless=FALSE, I get an error when running the Page.navigate promise :

[error] handle_read_frame error: websocketpp.transport:7 (End of File)

juba avatar Feb 28 '19 16:02 juba

You're right, I wrote an "interactive" script. My mistake.

library(crrri)

chrome <- chr_connect()
url = "https://www.rstudio.com"

configured <- 
  chrome %>%
  Page.enable() %>%
  Network.enable() %>%
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) 

intercepted <- 
  configured %>%
  Network.requestIntercepted() %...T>% {
    print("intercepted")
  }

configured %>%
  Page.navigate(url) %...!% {
  }

intercepted %...>% {
  chr_disconnect(chrome)
}

Using callback would be a better idea for this use case

RLesur avatar Feb 28 '19 17:02 RLesur

What another great use case for the event emitter API! I'll post a new example soon hoping it will work ok!

cderv avatar Feb 28 '19 18:02 cderv

Ok, one more thing I fear...

Your last example works fine when using promises only, but it seems I'm in trouble if I try to use callbacks :

library(crrri)
chrome <- chr_connect("google-chrome") 
url = "https://www.rstudio.com" 

configured <- chrome %>% 
  Page.enable() %>% 
  Network.enable() %>% 
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>% 
  Network.requestIntercepted(.callback = function(msg) { 
    print(msg$params$request) 
  }) 

ended <- configured %>% 
  Page.navigate(url) %>% 
  Page.loadEventFired() %>%
  DOM.getDocument() %>%
  DOM.getOuterHTML(nodeId = ~.$root$nodeId) %...T>%
  { print(.$result)}

ended %...>% { chr_disconnect(chrome) }

In this case I get an error : Unhandled promise error: objet de type 'closure' non indiçable. It seems that the promise returned when using .callback "breaks" the pipeline.

Once again, I'm sorry if this is a misunderstanding from my part.

juba avatar Mar 01 '19 12:03 juba

There are 2 different topics: one related to crrri and the other one related to the use of Chrome DevTools.

Using a callback in an event listener

As stated in the documentation, an event listener returns:

An async value of class promise. The value and the completion of the promise differ according to the use of a callback function. [...] When .callback is not NULL, the promise is fulfilled as soon as the callback is created; the value is a function without any argument that can be called to cancel the callback. When you use the .callback argument, you cannot send the result to any other command or event listener.

That means that the configured promise cannot be used as in this example.

Using Network.setRequestInterception()

In the example, all the requests are intercepted. When the Page.navigate command is used, the request is intercepted and will never be sent... So, the load event will never fire (the request is not sent, so there will be no response...)

Here's a modified and commented version of the last example:

library(crrri)
chrome <- chr_connect() 
url = "https://www.rstudio.com" 

configured <- 
  chrome %>% 
  Page.enable() %>% 
  Network.enable() %>% 
  # intercept all the requests (why?):
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>% 
  Network.requestIntercepted(.callback = function(msg) { 
    print(msg$params$request) 
  }) 

remove_callback <- function() {
  configured %...>%
    do.call(list())
}

# If you want to remove the callback, use:
# remove_callback()

ended <- # WARNING: this promise will never be resolved (see below)
  configured %...>% {
  chrome %>%
  Page.navigate(url) %>% # send a request
  Page.loadEventFired() %>% # Since all the requests are intercepted, the load event will never fire
  DOM.getDocument() %>%
  DOM.getOuterHTML(nodeId = ~.$root$nodeId) %...T>%
  { print(.$result)}
}

# WARNING: The ended promise will never be resolved (use a timeout!)
ended %...>% { chr_disconnect(chrome) }

RLesur avatar Mar 01 '19 14:03 RLesur

Ok, thanks for the explanation. I didn't find this in the DevTools documentation, but that's what I think I mostly understood.

The way I see it, the only way for it to work would be to issue a Network.continueInterceptedRequest after the requestIntercepted. This work when using only promises, but by putting a callback in requestIntercepted, I don't see how this could be possible as the promises pipeline is "broken".

Anyway I think I can achieve mostly what I'm trying to do by using responseReceived, which seems to be non-blocking.

juba avatar Mar 01 '19 14:03 juba

You can use Network.continueInterceptedRequest like that:

library(crrri)
chrome <- chr_connect() 
url = "https://www.rstudio.com" 

configured <- 
  chrome %>% 
  Page.enable() %>% 
  Network.enable() %>% 
  # intercept all the requests:
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) 


configured %>% 
  Network.requestIntercepted() %...>% {
    print(.$result$request)
    Network.continueInterceptedRequest(.$ws, interceptionId = .$result$interceptionId)
  }
  
configured %>%
  Page.navigate(url) %>% # send a request
  Page.loadEventFired() %>% 
  DOM.getDocument() %>%
  DOM.getOuterHTML(nodeId = ~ .res$root$nodeId) %...T>%
  { print(.$result)}

RLesur avatar Mar 01 '19 14:03 RLesur

Yes, this works, but can you add a callback in Network.requestIntercepted in this way ?

juba avatar Mar 01 '19 14:03 juba

I think this script will be fine:

library(crrri)
chrome <- chr_connect() 
url = "https://www.rstudio.com" 

configured <- 
  chrome %>% 
  Page.enable() %>% 
  Network.enable() %>% 
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) 


configured %>% 
  Network.requestIntercepted(.callback = function(msg) {
    print(msg$params$request)
    configured %>% Network.continueInterceptedRequest(interceptionId = msg$params$interceptionId)
  })
  
configured %>%
  Page.navigate(url) %>% # send a request
  Page.loadEventFired() %>% 
  DOM.getDocument() %>%
  DOM.getOuterHTML(nodeId = ~ .res$root$nodeId) %...T>%
  { print(.$result)}

RLesur avatar Mar 01 '19 15:03 RLesur

For the record, I think this works if you add a interceptionStage="HeadersReceived" argument to setRequestInterception, something like :

library(crrri) 
chrome <- chr_connect("google-chrome", headless = TRUE) 
url <- "https://rstudio.com" 
configured <- chrome %>% 
  Page.enable() %>% 
  Network.enable() %>% 
  Network.setRequestInterception(patterns = list(list(urlPattern="*", interceptionStage="HeadersReceived"))) 
configured %>% 
  Network.requestIntercepted(.callback = function(msg) { 
    print(msg$params$request$url) 
    configured %>% 
      Network.continueInterceptedRequest(interceptionId = msg$params$interceptionId) 
  }) 
ended <- configured %>% 
  Page.navigate(url) %>% 
  Page.loadEventFired() %>% 
  DOM.getDocument() %>% 
  DOM.getOuterHTML(nodeId = ~ .res$root$nodeId) %...T>% 
  { print("---"); print(substring(.$result, 0, 100))}
ended %...>% {print("Done"); chr_disconnect(chrome)}

This is great, many many thanks !

juba avatar Mar 01 '19 15:03 juba

What another great use case for the event emitter API! I'll post a new example soon hoping it will work ok!

OK for reference, I tested the new API using this use case. https://gist.github.com/cderv/67d7ad8998559f2ce14b4eb4bb852fd1#file-request_interception-r

cderv avatar Mar 02 '19 15:03 cderv