amazon-kinesis-aggregators
amazon-kinesis-aggregators copied to clipboard
process base64 encoded object from kinesis
Hi,
Is there a built in class for working with base64 encoded object? If not how would we go about supporting that?
Regards Paul
turns out we didn't need this
Data should be base64 decoded by the kinesis client. Do let me know if you have issues.
No worries thanks for the info. Yep after a bit of testing we managed to work out that it was already being decoded. The root cause of the problem was actually that the decoded event data wasn't just a json string, it had other text as well as the json string. Then I discovered the filterRegex, but having a bit of trouble getting it to pass in the regex
Tried the following in the agg.json and get the error below
"filterRegex": "{"schema".*}"
com.amazonaws.services.kinesis.aggregators.app.AggregatorsBeanstalkApp.contextInitialized com.fasterxml.jackson.core.JsonParseException: Unrecognized character escape '{'
Cheers Paul
Turns out "{"schema".*}" is the java string required to pass in to pattern.compile in order to make it match properly.
So I've updated agg.json with the following escaped version for json parsing.
"filterRegex": "\{\"schema\".*\}"
this seems to be accepted by the regex, but it's not matching anything. I can test the expression locally and it matches by json string inside the raw event data. I've confirmed that the regex string being used in the JSONSerializer is "{"schema".*}"
In fact I've even just hardcoded it in to
p = Pattern.compile("\\{\"schema\".*\\}");//{this.filterRegex);
The raw event data is:
srv 2016-08-11 06:17:53.262 2016-08-11 06:17:53.019 2016-08-11 06:19:13.600 unstruct d2aa8f52-97c6-4323-b9a3-c73914bbdb3a rb-0.5.2 ssc-0.5.0-kinesis kinesis-0.6.0-common-0.15.0 52.65.x.x 52c1709f-c02a-46e8-a452-208ef0c19df4 {"schema":"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0","data":{"schema":"iglu:com.ec/interaction_event/jsonschema/1-0-0","data":{}}} Ruby
Any assistance would be greatly appreciated.
Can you paste the entire configuration you are using?
Config below. I thought maybe there was a default delimiter which was splitting the full event across commas and so was unable to match on the split strings. So I tried with the delimiter below as well, but same error with or without delimiter.
com.amazonaws.services.kinesis.io.JsonDataExtractor.getData Failed to deserialise any content for Record
[{ "namespace":"interactions", "timeHorizons":["HOUR","DAY"], "type":"COUNT", "labelItems":["data.event_vendor_type"], "dateItem":"data.event_start_time_iso8601", "dateFormat":"yyyy-MM-dd'T'HH:mm:ssZ", "environment": "prod", "labelAttributeAlias": ["type"], "dateAttributeAlias": "time", "readIOPS":20, "writeIOPS":40, "emitMetrics":false, "lineTerminator": "<<<<", "dataExtractor":"JSON", "filterRegex": "\{\"schema\".*\}" }]
Thanks Paul
Also, I think the issue is that you may be trying to use a Json configuration, when in reality your events aren't actually JSON. You will (maybe unfortunately) need to set the Aggregator type to REGEX and use something like the following regular expression to match your values and lift them up as positional references:
(\w+) ([^\s]+ [^\s]+) ([^\s]+ [^\s]+) ([^\s]+ [^\s]+) (\w+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +\{\"schema\".*
What this does is extracts the character classes from your test event, where it finds the {"schema".*}
value in the string. It then maps those elements positionally into the aggregator context. I am certainly open to an embedded JSON sort of data extractor, but today it doesn't work that way.
Ah ok, I thought the filterRegex option on the json configuration would allow you to extract the json out of a non json event data and then process it as a single json event.
Don't really want to use the regex one, it's a bit messy and possibly a bit fragile to change. I guess I might have to create my own serialiser that can do that and use object as the dataextractor option. I might use the json one as a base and just adapt it to handle this scenario.
Cheers Paul
On 11 Aug 2016 5:02 PM, "IanMeyers" [email protected] wrote:
Also, I think the issue is that you may be trying to use a Json configuration, when in reality your events aren't actually JSON. You will (maybe unfortunately) need to set the Aggregator type to REGEX and use something like the following regular expression to match your values and lift them up as positional references:
(\w+) ([^\s]+ [^\s]+) ([^\s]+ [^\s]+) ([^\s]+ [^\s]+) (\w+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +{"schema".*
What this does is extracts the character classes from your test event, where it finds the {"schema".*} value in the string. It then maps those elements positionally into the aggregator context. I am certainly open to an embedded JSON sort of data extractor, but today it doesn't work that way.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/awslabs/amazon-kinesis-aggregators/issues/19#issuecomment-239107776, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxnyxMTS0KjpbN11dUfW9MxVD1ia05Jks5qeuUcgaJpZM4JhzAN .
Sorry totally unrelated, but could this be run in lambda instead of beanstalk?
If so what would be required and is it something you might be open to look at?
Cheers Paul
On 11 Aug 2016 5:02 PM, "IanMeyers" [email protected] wrote:
Also, I think the issue is that you may be trying to use a Json configuration, when in reality your events aren't actually JSON. You will (maybe unfortunately) need to set the Aggregator type to REGEX and use something like the following regular expression to match your values and lift them up as positional references:
(\w+) ([^\s]+ [^\s]+) ([^\s]+ [^\s]+) ([^\s]+ [^\s]+) (\w+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +([^\s]+)\ +{"schema".*
What this does is extracts the character classes from your test event, where it finds the {"schema".*} value in the string. It then maps those elements positionally into the aggregator context. I am certainly open to an embedded JSON sort of data extractor, but today it doesn't work that way.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/awslabs/amazon-kinesis-aggregators/issues/19#issuecomment-239107776, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxnyxMTS0KjpbN11dUfW9MxVD1ia05Jks5qeuUcgaJpZM4JhzAN .