crrri
crrri copied to clipboard
Logic for returning results from a function that get XHR calls response body
Hello, first of all compliments for your library, it's a great evolution over RSelenium (which I didn't like).
I created a function that allows to get the response body of resources requested by a page. The algorithm I made works well in interactive mode since I can wait for the requests to b fulfilled to collect the results, but I have no idea how to do this in an automatic function.
get_website_resources <- function(url, url_filter = '*', type_filter = '*') {
chrome <- Chrome$new()
out <- new.env()
out$l <- list()
client <- chrome$connect(callback = ~ NULL)
Fetch <- client$Fetch
Page <- client$Page
Fetch$enable(patterns = list(list(urlPattern="*", requestStage="Response"))) %...>% {
Fetch$requestPaused(callback = function(params) {
if (str_detect(params$request$url, url_filter) & str_detect(params$resourceType, type_filter)) {
Fetch$getResponseBody(requestId = params$requestId) %...>% {
resp <- .
if (resp$body != '') {
if (resp$base64Encoded) resp$body = base64_dec(resp$body) %>% rawToChar()
body <- list(list(
url = params$request$url,
response = resp
)) %>% set_names(params$requestId)
str(body)
out$l <- append(out$l, body)
}
}
}
Fetch$continueRequest(requestId = params$requestId)
})
} %...>% {
Page$navigate(url)
}
## the following two lines should be called only after the resources have been collected
#chrome$close()
#out$l
}
I have no idea if there is a smart way to intercept some relevant event. In alternative I thought of a timer or a counter that stops when a chosen number of resources is collected, but I wouldn't know even how to implement the last two (new to Promises).
Heres a use case:
resources <- get_website_resources('https://app.powerbi.com/view?r=eyJrIjoiM2MxY2RkMTQtOTA3Mi00MDIxLWE1NDktZjlmYTdlNDg0MTdkIiwidCI6IjhkZDFlNmI0LThkYWMtNDA4ZS04ZDhkLTY3NTNlOTgwMDUzMCIsImMiOjl9', url_filter = 'querydata', type_filter = 'XHR')
Thank you! Angelo
Hello, I may have found a solution:
get_website_resources <- function(url, url_filter = '*', type_filter = '*', wait_for = 20) {
out <- new.env()
out$l <- list()
perform_with_chrome( function(client) {
Fetch <- client$Fetch
Page <- client$Page
client$inspect()
Fetch$enable(patterns = list(list(urlPattern="*", requestStage="Response"))) %...>% {
Fetch$requestPaused(callback = function(params) {
if (str_detect(params$request$url, url_filter) & str_detect(params$resourceType, type_filter)) {
Fetch$getResponseBody(requestId = params$requestId) %...>% {
resp <- .
if (resp$body != '') {
if (resp$base64Encoded) resp$body = base64_dec(resp$body) %>% rawToChar()
body <- list(list(
url = params$request$url,
response = resp
)) %>% set_names(params$requestId)
out$l <- append(out$l, body)
}
}
}
Fetch$continueRequest(requestId = params$requestId)
})
} %...>% {
Page$navigate(url)
} %>% wait(wait_for)
})
out$l
}
This does the trick but is not very efficient (20s is a lot but you need a good margin). Is there a way to resolve the promise instead based on a condition? In the javascript world, I would trigger an event if the correct number of resources is catcher, but I have no idea how to do it with R promises.
Thank you
Hi @bakaburg1,
We're glad you like crrri! I haven't had time to dive in your example but here is what I would do to create an async counter with R promises:
# we can use a closure to define an async counter:
async_counter <- function(resolve_at = 20) {
# initialize objects
count <- 0
resolve_function <- NULL
# build a promise and store the resolve function
pr <- promises::promise(function(resolve, reject) {
resolve_function <<- resolve
})
# build a function to increment the counter
increment_counter <- function() {
count <<- count + 1
if (count >= resolve_at) {
resolve_function(count)
}
}
# return both the counter and the promise:
list(increment_counter = increment_counter, promise = pr)
}
# demo #1 -----------------------------------------------------------------
my_counter <- async_counter(resolve_at = 1)
# check the promise:
my_counter$promise
# increment the counter
my_counter$increment_counter()
# check the promise
my_counter$promise
# demo #2 -----------------------------------------------------------------
my_counter <- async_counter(resolve_at = 1)
my_counter$promise$then(function(value) {
cat("promise resolved, value:", value)
})
my_counter$increment_counter()
I hope it will help.
Thank you! it took me a while but I was able to more or less understand the logic of what you did and integrate it in my function. My problem was to stop a promise started by the crrri client. I'm not sure if what I did was the most efficient solution: I enclosed all the async calls in a promise which also exports a resolve function in the parent environment. Once the conditions are met, the resolve function is called, closing the open async calls.
get_website_resources <- function(url, url_filter = '*', type_filter = '*', wait_for = 15, n_of_resources = NULL) {
crrri::perform_with_chrome(function(client) {
Fetch <- client$Fetch
Page <- client$Page
client$inspect() # not needed, just to check what's going on
out <- new.env()
out$results <- list()
out$resolve_function <- NULL
# wrap everything into a promise which produces a resolution function
out$pr <- promises::promise(function(resolve, reject) {
out$resolve_function <- resolve
Fetch$enable(patterns = list(list(urlPattern="*", requestStage="Response"))) %...>% {
Fetch$requestPaused(callback = function(params) {
if (str_detect(params$request$url, url_filter) & str_detect(params$resourceType, type_filter)) {
Fetch$getResponseBody(requestId = params$requestId) %...>% {
resp <- .
if (resp$body != '') {
if (resp$base64Encoded) resp$body = base64_dec(resp$body) %>% rawToChar()
body <- list(list(
url = params$request$url,
response = resp
)) %>% set_names(params$requestId)
#str(body)
out$results <- append(out$results, body)
if (!is.null(n_of_resources) & length(out$results) >= n_of_resources) out$resolve_function(out$results)
}
}
}
Fetch$continueRequest(requestId = params$requestId)
})
} %...>% {
Page$navigate(url)
} %>% crrri::wait(wait_for) %>%
then(~ out$resolve_function(out$results))
})
out$pr$then(function(x) x)
}, timeouts = max(wait_for + 3, 30), cleaning_timeout = max(wait_for + 3, 30))
}