paws icon indicating copy to clipboard operation
paws copied to clipboard

Use paws to authenticate a httr2 request?

Open hadley opened this issue 1 year ago • 8 comments

Is it possible to use paws to authenticate a request that I'm making with httr2? I want to perform a request to the bedrock runtime ConverseStream operation using the (very new) httr2::req_perform_connection().

(Related to #839)

hadley avatar Oct 09 '24 20:10 hadley

Is it possible to use paws to authenticate a request that I'm making with httr2?

In it's current state I don't think so 🤔. I believe the request to httr::VERB would have to be expose to translate it over to a httr2 request instead.

https://github.com/paws-r/paws/blob/5a37466b9ef25cc312310069fba89a9b9441fb1b/paws.common/R/net.R#L128-L136

I have been trying to think in how to handle streaming data and was pondering if a new class would be the best approach 🤔

Currently paws is using httr. I guess to fully utilise streaming functionality paws should update to httr2 and take advantage of all the benefits it offers. I am going on holiday for the next 3 weeks but when I get back I will try an experimental branch to migrate paws to httr2.

Sorry this isn't a full answer to your question.

DyfanJones avatar Oct 09 '24 21:10 DyfanJones

For the project I need it for, I might just bite the bullet and implement the AWS SigV4 signing protocol myself (I'm talking to a bunch of other LLMs with pure httr2 calls). I don't know how it would be to extract that logic out of paws into an exported function, but that would certainly make life easier for me.

hadley avatar Oct 09 '24 21:10 hadley

When I am back from holiday I am happy to expose paws's AWS SigV4 (it was on my todo list). That should make it simpler for you.

From my knowledge you might need to convert some of the raw response into int8, int16, int32, int64 and uint8, uint16, uint32, uint64. To extract some key information before parse the message back

https://github.com/boto/botocore/blob/8e2e8fd7ab59f8c1337902acc32d2ee10cb184ad/botocore/eventstream.py

DyfanJones avatar Oct 09 '24 21:10 DyfanJones

I have been playing around with some ideas in how to do this in R. The pkd package looks like a useful however it isn't on cran.

I have managed to implement a method in R:

big_endian <- function(vec, dtype) {
  switch(
    dtype,
    "int64" = c(
      vec[8:1], vec[16:9], vec[24:17], vec[32:25], vec[40:33], vec[48:41], vec[56:49], vec[64:57]
    ),
    "int32" = c(vec[8:1], vec[16:9], vec[24:17], vec[32:25]),
    "int16" = c(vec[8:1], vec[16:9]),
    "int8" = vec[8:1]
  )
}

int_to_uint <- function (x, adjustment=2^32) {
  if (sign(x) < 0) {
    return(x + adjustment)
  }
  return(x)
}

# Convert raw vector into integers with big-endian
int64 <- function(x) {
  bits <- as.integer(big_endian(rawToBits(x), "int64"))
  sum(bits[-1] * 2^(62:0)) - bits[[1]] * 2^63
}

int32 <- function(x) {
  bits <- as.integer(big_endian(rawToBits(x), "int32"))
  sum(bits[-1] * 2^(30:0)) - bits[[1]] * 2^31
}

int16 <- function(x) {
  bits <- as.integer(big_endian(rawToBits(x), "int16"))
  sum(bits[-1] * 2^(14:0)) - bits[[1]] * 2^15
}

int8 <- function(x) {
  bits <- as.integer(big_endian(rawToBits(x), "int8"))
  sum(bits[-1] * 2^(6:0)) - bits[[1]] * 2^7
}

# Converts raw vector into unsigned integers with big-endian
uint64 <- function(x) {
  int_to_uint(int64(x), 2^64)
}

uint32 <- function(x) {
  int_to_uint(readBin(x, "integer", n=length(x), size = 4, endian = "big"))
}

uint16 <- function(x) {
  readBin(x, "integer", n=length(x), size=2, signed = F, endian = "big")
}

uint8 <- function(x) {
  readBin(x, "integer", n=length(x), size=1, signed = F, endian = "big")
}

obj <- openssl::rand_bytes(8)

uint8(obj[1])
#> [1] 228
uint16(obj[1:2])
#> [1] 58508
uint32(obj[1:4])
#> [1] 3834393554
uint64(obj)
#> [1] 1.646859e+19


int8(obj[1])
#> [1] -28
int16(obj[1:2])
#> [1] -7028
int32(obj[1:4])
#> [1] -460573742
int64(obj)
#> [1] -1.978149e+18

pkd::uint8(obj[1])
#> <pkd_uint8[1]>
#> [1] 228
pkd::uint16(obj[1:2], endian = 0)
#> <pkd_uint16[1]>
#> [1] 58508
pkd::uint32(obj[1:4], endian = 0)
#> <pkd_uint32[1]>
#> [1] 3834393554
pkd::uint64(obj, endian = 0)
#> <pkd_uint64[1]>
#> [1] 1.646859e+19


pkd::int8(obj[1])
#> <pkd_int8[1]>
#> [1] -28
pkd::int16(obj[1:2], endian = 0)
#> <pkd_int16[1]>
#> [1] -7028
pkd::int32(obj[1:4], endian = 0)
#> <pkd_int32[1]>
#> [1] -460573742
pkd::int64(obj, endian = 0)
#> <pkd_int64[1]>
#> [1] -1.978149e+18

Created on 2024-10-09 with reprex v2.1.1

DyfanJones avatar Oct 09 '24 21:10 DyfanJones

I was hoping it would use server-sent events like every other API 😭

hadley avatar Oct 09 '24 21:10 hadley

AWS can be a bit of a pain at times :) If you managed to get a working prototype I would be really interested as it should help with the implementation in paws. :)

DyfanJones avatar Oct 09 '24 21:10 DyfanJones

@jcheng5 discovered that curl actually has a native implementation: https://curl.se/libcurl/c/CURLOPT_AWS_SIGV4.html. So auth, at least, will be easier than expected.

hadley avatar Oct 10 '24 12:10 hadley

Some docs for the protocol at https://docs.aws.amazon.com/transcribe/latest/dg/streaming-setting-up.html#streaming-event-stream. I'm going to try and parse this in httr2 so you'll be able to use it if desired.

hadley avatar Oct 21 '24 00:10 hadley

And implemented in https://github.com/r-lib/httr2/pull/571 😄

Here's an example of what streaming code looks like:

creds <- paws.common::locate_credentials()
model_id <- "anthropic.claude-3-5-sonnet-20240620-v1:0"
req <- request("https://bedrock-runtime.us-east-1.amazonaws.com")
req <- req_url_path_append(req, "model", model_id, "converse-stream")
req <- req_body_json(req, list(
  messages = list(list(
    role = "user",
    content = list(list(text = "What's your name?"))
  ))
))
req <- req_auth_aws_v4(
  req,
  aws_access_key_id = creds$access_key_id,
  aws_secret_access_key = creds$secret_access_key,
  aws_session_token = creds$session_token
)

con <- req_perform_connection(req)
repeat{
  event <- resp_stream_aws(con)
  if (is.null(event)) {
    close(con)
    break
  }

  str(event)
}

hadley avatar Oct 23 '24 22:10 hadley

This is great! I will take a proper look at this once I get back from my holiday :)

DyfanJones avatar Oct 24 '24 00:10 DyfanJones

🤔 From looking at this, I believe paws will need to expose the connection to allow for streaming.

DyfanJones avatar Nov 04 '24 17:11 DyfanJones

You could also provide a call back interface (like the older req_perform_stream()) but returning the connection object gives the user maximum flexibility.

hadley avatar Nov 04 '24 17:11 hadley

This is alot more work than I initially thought. I will list stuff I need to do, to get streaming in paws SDK properly

  • [x] Update paws backend to httr2 from httr
  • [x] Identify which methods need streaming handlers (Aws API jsons)
  • [x] Expose stream API into method operations (make paws)
  • [x] Regenerate paws SDK with stream_api identifier
  • [x] Allow connections to be passed to unmarshal methods
  • [x] New StreamHandler
  • [x] Error Handling for new StreamHandler
  • [x] Unit tests for StreamHandler
  • [x] content-type json
  • [x] content-type xml
  • [x] Documentation for StreamHandler

DyfanJones avatar Nov 19 '24 21:11 DyfanJones

Side note: just noticed aws-sdk-js api JSONs have stopped being updated. For the short term will switch to botocore JSON files. Long term will need to more to smithy.

DyfanJones avatar Nov 19 '24 21:11 DyfanJones

Initial dev design:

library(paws)
library(httr2)

client <- bedrockruntime(region = "us-east-1")

model_id <- "amazon.titan-text-lite-v1:0"

resp <- client$converse_stream(
  modelId = model_id,
  messages = list(
    list(
      role = "user",
      content = list(list(text = "What's your name?"))
    ))
)

# Return httr2 req_performance_connection for full flexibility
con <- resp$stream(.connection = T)

repeat{
  event <- resp_stream_aws(con)
  if (is.null(event)) {
    close(con)
    break
  }
  
  str(event)
}

# OR
# Utilise paws unmarshal methods to parse response
resp$stream(\(chunk) print(chunk$contentBlockDelta$delta$text))

I think this initial design should give best of both worlds. I just need to do the plumbing for the paws unmarshal methods and capture stream error. But it is looking promising :)

DyfanJones avatar Nov 21 '24 16:11 DyfanJones

Plus Could always create a similar function to httr2:: resp_stream_aws but return the operations expected output: https://www.paws-r-sdk.com/docs/bedrockruntime_converse_stream/

library(paws)

client <- bedrockruntime(region = "us-east-1")

model_id <- "amazon.titan-text-lite-v1:0"

resp <- client$converse_stream(
  modelId = model_id,
  messages = list(
    list(
      role = "user",
      content = list(list(text = "What's your name?"))
    ))
)

# Return httr2 req_performance_connection for full flexibility
con <- resp$stream(.connection = T)

while(!is.null(event <- paws_stream_parser(con))) {
    print(chunk$contentBlockDelta$delta$text)
}
close(con)

I think these 3 options could give alot of flexibility to the user.

DyfanJones avatar Nov 21 '24 16:11 DyfanJones

Nice!

hadley avatar Dec 10 '24 06:12 hadley

@hadley paws 0.8.0 is on r-universe. Please have a go 😄

install.packages('paws', repos = c(pawsr = 'https://paws-r.r-universe.dev', CRAN = 'https://cloud.r-project.org'))

DyfanJones avatar Dec 10 '24 09:12 DyfanJones

Planned cran release Feb 07. Apologies in the long wait. Paws 0.8.0 has a lot of updates and I didn't want to release without extra checks.

DyfanJones avatar Jan 18 '25 14:01 DyfanJones

paws 0.8.0 is finally on cran :D and streaming is now supported along will exposing the httr2 connection object for full flexibility :)

DyfanJones avatar Feb 10 '25 13:02 DyfanJones