amazon-kinesis-aggregators icon indicating copy to clipboard operation
amazon-kinesis-aggregators copied to clipboard

process base64 encoded object from kinesis

Open fridaystreet opened this issue 8 years ago • 9 comments

Hi,

Is there a built in class for working with base64 encoded object? If not how would we go about supporting that?

Regards Paul

fridaystreet avatar Aug 11 '16 04:08 fridaystreet

turns out we didn't need this

fridaystreet avatar Aug 11 '16 06:08 fridaystreet

Data should be base64 decoded by the kinesis client. Do let me know if you have issues.

IanMeyers avatar Aug 11 '16 07:08 IanMeyers

No worries thanks for the info. Yep after a bit of testing we managed to work out that it was already being decoded. The root cause of the problem was actually that the decoded event data wasn't just a json string, it had other text as well as the json string. Then I discovered the filterRegex, but having a bit of trouble getting it to pass in the regex

Tried the following in the agg.json and get the error below

"filterRegex": "{"schema".*}"

com.amazonaws.services.kinesis.aggregators.app.AggregatorsBeanstalkApp.contextInitialized com.fasterxml.jackson.core.JsonParseException: Unrecognized character escape '{'

Cheers Paul

fridaystreet avatar Aug 11 '16 07:08 fridaystreet

Turns out "{"schema".*}" is the java string required to pass in to pattern.compile in order to make it match properly.

So I've updated agg.json with the following escaped version for json parsing.

"filterRegex": "\{\"schema\".*\}"

this seems to be accepted by the regex, but it's not matching anything. I can test the expression locally and it matches by json string inside the raw event data. I've confirmed that the regex string being used in the JSONSerializer is "{"schema".*}"

In fact I've even just hardcoded it in to p = Pattern.compile("\\{\"schema\".*\\}");//{this.filterRegex);

The raw event data is:

srv 2016-08-11 06:17:53.262 2016-08-11 06:17:53.019 2016-08-11 06:19:13.600 unstruct    d2aa8f52-97c6-4323-b9a3-c73914bbdb3a            rb-0.5.2    ssc-0.5.0-kinesis   kinesis-0.6.0-common-0.15.0     52.65.x.x               52c1709f-c02a-46e8-a452-208ef0c19df4                                                                                                                                                                    {"schema":"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0","data":{"schema":"iglu:com.ec/interaction_event/jsonschema/1-0-0","data":{}}}                                                                           Ruby

Any assistance would be greatly appreciated.

fridaystreet avatar Aug 11 '16 08:08 fridaystreet

Can you paste the entire configuration you are using?

IanMeyers avatar Aug 11 '16 08:08 IanMeyers

Config below. I thought maybe there was a default delimiter which was splitting the full event across commas and so was unable to match on the split strings. So I tried with the delimiter below as well, but same error with or without delimiter.

com.amazonaws.services.kinesis.io.JsonDataExtractor.getData Failed to deserialise any content for Record

[{ "namespace":"interactions", "timeHorizons":["HOUR","DAY"], "type":"COUNT", "labelItems":["data.event_vendor_type"], "dateItem":"data.event_start_time_iso8601", "dateFormat":"yyyy-MM-dd'T'HH:mm:ssZ", "environment": "prod", "labelAttributeAlias": ["type"], "dateAttributeAlias": "time", "readIOPS":20, "writeIOPS":40, "emitMetrics":false, "lineTerminator": "<<<<", "dataExtractor":"JSON", "filterRegex": "\{\"schema\".*\}" }]

Thanks Paul

fridaystreet avatar Aug 11 '16 08:08 fridaystreet

Also, I think the issue is that you may be trying to use a Json configuration, when in reality your events aren't actually JSON. You will (maybe unfortunately) need to set the Aggregator type to REGEX and use something like the following regular expression to match your values and lift them up as positional references:

(\w+) ([^\s]+ [^\s]+) ([^\s]+ [^\s]+) ([^\s]+ [^\s]+) (\w+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +\{\"schema\".*

What this does is extracts the character classes from your test event, where it finds the {"schema".*} value in the string. It then maps those elements positionally into the aggregator context. I am certainly open to an embedded JSON sort of data extractor, but today it doesn't work that way.

IanMeyers avatar Aug 11 '16 09:08 IanMeyers

Ah ok, I thought the filterRegex option on the json configuration would allow you to extract the json out of a non json event data and then process it as a single json event.

Don't really want to use the regex one, it's a bit messy and possibly a bit fragile to change. I guess I might have to create my own serialiser that can do that and use object as the dataextractor option. I might use the json one as a base and just adapt it to handle this scenario.

Cheers Paul

On 11 Aug 2016 5:02 PM, "IanMeyers" [email protected] wrote:

Also, I think the issue is that you may be trying to use a Json configuration, when in reality your events aren't actually JSON. You will (maybe unfortunately) need to set the Aggregator type to REGEX and use something like the following regular expression to match your values and lift them up as positional references:

(\w+) ([^\s]+ [^\s]+) ([^\s]+ [^\s]+) ([^\s]+ [^\s]+) (\w+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +{"schema".*

What this does is extracts the character classes from your test event, where it finds the {"schema".*} value in the string. It then maps those elements positionally into the aggregator context. I am certainly open to an embedded JSON sort of data extractor, but today it doesn't work that way.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/awslabs/amazon-kinesis-aggregators/issues/19#issuecomment-239107776, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxnyxMTS0KjpbN11dUfW9MxVD1ia05Jks5qeuUcgaJpZM4JhzAN .

fridaystreet avatar Aug 11 '16 10:08 fridaystreet

Sorry totally unrelated, but could this be run in lambda instead of beanstalk?

If so what would be required and is it something you might be open to look at?

Cheers Paul

On 11 Aug 2016 5:02 PM, "IanMeyers" [email protected] wrote:

Also, I think the issue is that you may be trying to use a Json configuration, when in reality your events aren't actually JSON. You will (maybe unfortunately) need to set the Aggregator type to REGEX and use something like the following regular expression to match your values and lift them up as positional references:

(\w+) ([^\s]+ [^\s]+) ([^\s]+ [^\s]+) ([^\s]+ [^\s]+) (\w+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +{"schema".*

What this does is extracts the character classes from your test event, where it finds the {"schema".*} value in the string. It then maps those elements positionally into the aggregator context. I am certainly open to an embedded JSON sort of data extractor, but today it doesn't work that way.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/awslabs/amazon-kinesis-aggregators/issues/19#issuecomment-239107776, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxnyxMTS0KjpbN11dUfW9MxVD1ia05Jks5qeuUcgaJpZM4JhzAN .

fridaystreet avatar Aug 11 '16 10:08 fridaystreet