paws icon indicating copy to clipboard operation
paws copied to clipboard

Kinesis client putrecord issue?

Open edgBR opened this issue 4 years ago • 3 comments

Dear colleagues,

I am trying to run an Rscript in an EC2 instance that I am using as producer.

The code is as follows and is based in the ML specialist cloud guru code (https://github.com/ACloudGuru-Resources/Course_AWS_Certified_Machine_Learning/blob/master/Chapter3/put-record-python-program.py):

library(httr)
library(paws)
library(jsonlite)
library(lubridate)
library(uuid)
library(dplyr)

client = paws::kinesis(config = list(region = "us-east-1"))
partition_key <- uuid::UUIDgenerate(n=1)

# Added 08/2020 since randomuser.me is starting to throttle API calls
# The following code loads 500 random users into memory
number_of_results <- 500

request <-
  httr::GET(url = paste0('https://randomuser.me/api/?exc=login&results=', number_of_results))
data <- request %>% content()
data <- data$results

while (TRUE) {
  # The following chooses a random user from the 500 random users pulled from the API in a single API call.
  random_user_index <-
    runif(n = 1, min = 0, max = number_of_results - 1) %>% as.integer()
  random_user <- data[random_user_index]
  random_user <- toJSON(random_user)
  client$put_record(StreamName = "my_stream",
                    Data = random_user,
                    PartitionKey = partition_key)
  Sys.sleep(runif(n = 1, min = 0, max = 1))
  
}

However I am getting the following error:

Error in file(what, "rb") : cannot open the connection
Calls: <Anonymous> ... convert_blob -> raw_to_base64 -> <Anonymous> -> file
In addition: Warning message:
In file(what, "rb") :
  cannot open file '[{"gender":["female"],"name":{"title":["Madame"],"first":["Alisha"],"last":["Denis"]},"location":{"street":{"number":[7221],"name":["Rue du Village"]},"city":["Hüttlingen"],"state":["Basel-Landschaft"],"country":["Switzerland"],"postcode":[6851],"coordinates":{"latitude":["-25.4605"],"longitude":["88.3460"]},"timezone":{"offset":["+6:00"],"description":["Almaty, Dhaka, Colombo"]}},"email":["[email protected]"],"dob":{"date":["1947-06-02T01:05:06.443Z"],"age":[74]},"registered":{"date":["2019-02-10T05:22:41.058Z"],"age":[2]},"phone":["077 098 25 01"],"cell":["075 948 41 92"],"id":{"name":["AVS"],"value":["756.4495.7678.82"]},"picture":{"large":["https://randomuser.me/api/portraits/women/75.jpg"],"medium":["https://randomuser.me/api/portraits/med/women/75.jpg"],"thumbnail":["https://randomuser.me/api/portraits/thumb/women/75.jpg"]},"nat":["CH"]}]': File name too long
Execution halted

Looking to the second line of error makes me wonder if I need an intermediate file for doing this operation. I have tried also to use jsonlite::base64_enc but it also does not work.

Could someone point out what are the issues here?

Attaching sessionInfo:

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-koji-linux-gnu (64-bit)
Running under: Amazon Linux 2

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] dplyr_1.0.5      uuid_0.1-4       lubridate_1.7.10 jsonlite_1.7.2
[5] paws_0.1.10      httr_1.4.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6       fansi_0.4.2      crayon_1.4.1     utf8_1.1.4
 [5] R6_2.5.0         lifecycle_1.0.0  magrittr_2.0.1   pillar_1.5.1
 [9] rlang_0.4.10     vctrs_0.3.6      generics_0.1.0   ellipsis_0.3.1
[13] glue_1.4.2       purrr_0.3.4      compiler_4.0.2   pkgconfig_2.0.3
[17] tidyselect_1.1.0 tibble_3.1.0
>

edgBR avatar Mar 07 '21 19:03 edgBR

Sorry about that. Currently the Data parameter expects a binary object, e.g. charToRaw(random_user), but we'll try to fix it so it works like the Python SDK. The update won't be on CRAN for a couple weeks though because we just updated and we're limited to one update every 30 days.

davidkretch avatar Mar 07 '21 19:03 davidkretch

Hi @davidkretch,

I think unifiying the working way towards boto3 will help a lot and will also increase the adaptation of paws.

Do you have a list of which methods of paws are expecting a binary object?

BR /Edgar

edgBR avatar Apr 05 '21 13:04 edgBR

There is unfortunately not a list of methods that expect a binary object. Paws is generated from AWS's own API definitions, and it is whenever they expect a binary object. For example, in Kinesis's put_record operation, the documentation states:

"The data blob to put into the record, which is base64-encoded when the blob is serialized. When the data blob (the payload before base64-encoding) is added to the partition key size, the total size must not exceed the maximum record size."

Python's SDK is obviously being more helpful in this case, in that you don't have to provide the blob yourself. We'll need to look into what Python is doing -- different services might have different needs. S3 for example has similar requirements, but in that case Paws has a custom way of handling them that is particular to S3.

davidkretch avatar Apr 11 '21 22:04 davidkretch