kinesis: cross account kinesis streams
Hi there!
I've been doing a POC on arroyo and have it deployed and set up in my EKS cluster. I was just curious if I can read a Kinesis source from another account as long as I have the IAM role associated with the SA that has the correct cross-account policies?
Or will arroyo only read from Kinesis streams that belong to the same account as the EKS cluster? I don't see any documentation on this use case.
last question - can the Kinesis source take an endpoint instead of a stream name?
thanks!
Arroyo is using the standard AWS config chain to handle auth for Kinesis, so it should work as expected if you give the right IAM permissions to the node. There are various ways to do that, but for EKS we recommend using IAM Roles for Service Accounts (IRSA). The documentation here for S3 also applies to other AWS services like Kinesis: https://doc.arroyo.dev/deployment/kubernetes#amazon-eks.
I don't think it's possible to set the endpoint right now, but that's a simple feature if needed.
@mwylde great news! I have IAM Service accounts in the cluster, so I should be good to go.
I don't think i need the endpoint right now ... I'll try with just the stream name, but maybe I'll open a feature request for the endpoint if I do end up needing it.
thanks again
@mwylde quick question - would i be able to use multiple custom UDF on raw bytes to decode and parse it into something usable downstream?
Something like this example
CREATE TABLE stream (
timestamp String,
record Bytes,
) WITH (
connector = 'kinesis',
stream_name = 'my_stream',
type = 'source',
format = 'raw_bytes',
'source.offset' = 'latest'
);
CREATE VIEW records AS (
SELECT * FROM (
SELECT parse(decode(record))
FROM stream)
);
select * from records
where the decode and parse udfs would do what they need to decode the raw bytes from the kinseis stream, and then the parse would parse the data into a schema that we can query from?
the streaming SQL might not be correct but hopefully it conveys the question somewhat
Yes, you can arbitrarily nest UDFs. The main difficulty is that UDFs can't produce structs as outputs yet, so you need to parse into something that can be used by the next UDF. The easiest path is to convert your binary format to JSON, then you can use the built-in JSON functions or serde_json in a UDF.
See the example here doing this with CBOR: https://www.arroyo.dev/blog/arroyo-0-11-0#raw-bytes
ahh perfect!