crrri icon indicating copy to clipboard operation
crrri copied to clipboard

Logic for returning results from a function that get XHR calls response body

Open bakaburg1 opened this issue 3 years ago • 3 comments

Hello, first of all compliments for your library, it's a great evolution over RSelenium (which I didn't like).

I created a function that allows to get the response body of resources requested by a page. The algorithm I made works well in interactive mode since I can wait for the requests to b fulfilled to collect the results, but I have no idea how to do this in an automatic function.

get_website_resources <- function(url, url_filter = '*', type_filter = '*') {

  chrome <- Chrome$new()

  out <- new.env()

  out$l <- list()

  client <- chrome$connect(callback = ~ NULL)

  Fetch <- client$Fetch
  Page <- client$Page

  Fetch$enable(patterns = list(list(urlPattern="*", requestStage="Response"))) %...>% {
    Fetch$requestPaused(callback = function(params) {

      if (str_detect(params$request$url, url_filter) & str_detect(params$resourceType, type_filter)) {

        Fetch$getResponseBody(requestId = params$requestId) %...>% {
          resp <- .

          if (resp$body != '') {
            if (resp$base64Encoded) resp$body = base64_dec(resp$body) %>% rawToChar()

            body <- list(list(
              url = params$request$url,
              response = resp
            )) %>% set_names(params$requestId)

            str(body)

            out$l <- append(out$l, body)
          }

        }
      }

      Fetch$continueRequest(requestId = params$requestId)
    })
  } %...>% {
    Page$navigate(url)
  }


 ## the following two lines should be called only after the resources have been collected
 #chrome$close() 
 #out$l
}

I have no idea if there is a smart way to intercept some relevant event. In alternative I thought of a timer or a counter that stops when a chosen number of resources is collected, but I wouldn't know even how to implement the last two (new to Promises).

Heres a use case: resources <- get_website_resources('https://app.powerbi.com/view?r=eyJrIjoiM2MxY2RkMTQtOTA3Mi00MDIxLWE1NDktZjlmYTdlNDg0MTdkIiwidCI6IjhkZDFlNmI0LThkYWMtNDA4ZS04ZDhkLTY3NTNlOTgwMDUzMCIsImMiOjl9', url_filter = 'querydata', type_filter = 'XHR')

Thank you! Angelo

bakaburg1 avatar Jul 21 '20 10:07 bakaburg1

Hello, I may have found a solution:

get_website_resources <- function(url, url_filter = '*', type_filter = '*', wait_for = 20) {

  out <- new.env()

  out$l <- list()

  perform_with_chrome( function(client) {
    Fetch <- client$Fetch
    Page <- client$Page

    client$inspect()

    Fetch$enable(patterns = list(list(urlPattern="*", requestStage="Response"))) %...>% {
      Fetch$requestPaused(callback = function(params) {

        if (str_detect(params$request$url, url_filter) & str_detect(params$resourceType, type_filter)) {

          Fetch$getResponseBody(requestId = params$requestId) %...>% {
            resp <- .

            if (resp$body != '') {
              if (resp$base64Encoded) resp$body = base64_dec(resp$body) %>% rawToChar()

              body <- list(list(
                url = params$request$url,
                response = resp
              )) %>% set_names(params$requestId)

              out$l <- append(out$l, body)
            }

          }
        }

        Fetch$continueRequest(requestId = params$requestId)
      })
    } %...>% {
      Page$navigate(url)
    } %>% wait(wait_for)
  })

  out$l
}

This does the trick but is not very efficient (20s is a lot but you need a good margin). Is there a way to resolve the promise instead based on a condition? In the javascript world, I would trigger an event if the correct number of resources is catcher, but I have no idea how to do it with R promises.

Thank you

bakaburg1 avatar Jul 21 '20 15:07 bakaburg1

Hi @bakaburg1,

We're glad you like crrri! I haven't had time to dive in your example but here is what I would do to create an async counter with R promises:

# we can use a closure to define an async counter:
async_counter <- function(resolve_at = 20) {
  # initialize objects
  count <- 0
  resolve_function <- NULL
  
  # build a promise and store the resolve function
  pr <- promises::promise(function(resolve, reject) {
    resolve_function <<- resolve
  })
  
  # build a function to increment the counter
  increment_counter <- function() {
    count <<- count + 1
    if (count >= resolve_at) {
      resolve_function(count)
    }
  }
  
  # return both the counter and the promise:
  list(increment_counter = increment_counter, promise = pr)
}

# demo #1 -----------------------------------------------------------------
my_counter <- async_counter(resolve_at = 1)
# check the promise:
my_counter$promise
# increment the counter
my_counter$increment_counter()
# check the promise
my_counter$promise

# demo #2 -----------------------------------------------------------------
my_counter <- async_counter(resolve_at = 1)
my_counter$promise$then(function(value) {
  cat("promise resolved, value:", value)
})
my_counter$increment_counter()

I hope it will help.

RLesur avatar Jul 21 '20 16:07 RLesur

Thank you! it took me a while but I was able to more or less understand the logic of what you did and integrate it in my function. My problem was to stop a promise started by the crrri client. I'm not sure if what I did was the most efficient solution: I enclosed all the async calls in a promise which also exports a resolve function in the parent environment. Once the conditions are met, the resolve function is called, closing the open async calls.

get_website_resources <- function(url, url_filter = '*', type_filter = '*', wait_for = 15, n_of_resources = NULL) {

  crrri::perform_with_chrome(function(client) {
    Fetch <- client$Fetch
    Page <- client$Page

    client$inspect() # not needed, just to check what's going on

    out <- new.env()

    out$results <- list()
    out$resolve_function <- NULL

    # wrap everything into a promise which produces a resolution function
    out$pr <- promises::promise(function(resolve, reject) {
      out$resolve_function <- resolve

      Fetch$enable(patterns = list(list(urlPattern="*", requestStage="Response"))) %...>% {
        Fetch$requestPaused(callback = function(params) {

          if (str_detect(params$request$url, url_filter) & str_detect(params$resourceType, type_filter)) {

            Fetch$getResponseBody(requestId = params$requestId) %...>% {
              resp <- .

              if (resp$body != '') {
                if (resp$base64Encoded) resp$body = base64_dec(resp$body) %>% rawToChar()

                body <- list(list(
                  url = params$request$url,
                  response = resp
                )) %>% set_names(params$requestId)

                #str(body)

                out$results <- append(out$results, body)

                if (!is.null(n_of_resources) & length(out$results) >= n_of_resources) out$resolve_function(out$results)
              }

            }
          }

          Fetch$continueRequest(requestId = params$requestId)
        })
      } %...>% {
        Page$navigate(url)
      } %>% crrri::wait(wait_for) %>%
        then(~ out$resolve_function(out$results))

    })

    out$pr$then(function(x) x)
  }, timeouts = max(wait_for + 3, 30), cleaning_timeout = max(wait_for + 3, 30))
}

bakaburg1 avatar Jul 22 '20 09:07 bakaburg1