event-ruler
                                
                                 event-ruler copied to clipboard
                                
                                    event-ruler copied to clipboard
                            
                            
                            
                        Undocumented duplicate key behaviour for events and rules
Describe the bug
High level description of the issue is we can create duplicate keys in our rules and the library will only use the last duplicate in the rule
For example if this is my rule:
{
  "source": ["aws.s3"],
  "source": ["aws.sns"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": {
    "eventSource": ["s3.amazonaws.com"],
    "eventSource": ["sns.amazonaws.com"]
  }
}
A customer may expect it to capture both detail S3 and SNS events from CloudTrail. But it turns out we only trigger only the second key (which in this case is SNS).
This is within the bounds of the JSON spec https://datatracker.ietf.org/doc/html/rfc8259#section-4
When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only.
Still, it's better if we 1/ handle it for users when de-serializing via ObjectMapper[1] and 2/ document this edge case if they are passing the JSONNode to us.
[1] https://github.com/FasterXML/jackson-databind/issues/237 shows that we can use DeserializationFeature.FAIL_ON_READING_DUP_TREE_KEY
Here's what I believe:
- When possible, we should check for duplicate keys in Rules as they are added, and reject such rules.
- We should document that for Events, when duplicate keys occur we will only consider the final value in document order, and ensure that the implementation follows that rule.
#2 because that's what the software does now and if you change it you'll probably break at least one relying service, which means it has become part of your contract per Hyrum's law: https://www.hyrumslaw.com
Still digging into this one but (1) might be an issue if clients are not checking for duplicate keys in rules similar to Ruler.matches() method
Proposal for this issue
1/ Add a caveat that Ruler's current APIs follow the default jackson behaviour for deserializting. This means we'll only consider the final value in the document order. Maybe a final note that this is subject to change in future. 2/ For clients that aren't comfortable with this we ask them to pass the JSON-Nodes instead of Strings. This will let them configure de-serialization however they want. We'll need to add a new method from JsonRuleCompiler.java and upwards to allow clients to pass a JsonParser or JsonNode.
This avoids breaking any clients today, and lets them migrate in future. Not expecting any performance hits as we create a parse during rule-compilation https://github.com/aws/event-ruler/blob/main/src/main/software/amazon/event/ruler/JsonRuleCompiler.java#L118
It may be a bit more complex than that.  Ruler has multiple APIs with different approaches. In at least one place it uses Jackson ObjectMapper, which has the behavior you describe.  In most places it uses the nextToken() API, which actually means the software will see the duplicate entries, and I forget what it does.
My proposal would be to figure out what it does and document that.  I would say that we should reject addRule calls that have duplicate fields in the pattern, except that would be a breaking change for some of the AWS callers. So maybe add an addRuleSafely method that reports syntax errors for duplicate keys.
agree with Tim, we shall ensure to reject the duplicated filed name for none array situation, we can enforce our API to allow
{
  "source": ["aws.s3"],
  "source": ["aws.sns"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": [
    { "eventSource": ["s3.amazonaws.com"] },
    { "eventSource": ["sns.amazonaws.com"] }
  ]
}
but reject below
{
  "source": ["aws.s3"],
  "source": ["aws.sns"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": {
    "eventSource": ["s3.amazonaws.com"],
    "eventSource": ["sns.amazonaws.com"]
  }
}
oh, please ignore my previous reply, I had thought it is for event, I mean for even, we can enforce above what I mentioned. for rules, we don't support below sort of rule (which means to both of event source need be present in one same array inside event) yet ...
  "detail": [
    { "eventSource": ["s3.amazonaws.com"] },
    { "eventSource": ["sns.amazonaws.com"] }
  ]
Looks like when we aren't using ObjectMapper, we use JsonParser. Just like ObjectMapper, there's a way to configure the parser to enforce STRICT_DUPLICATE_DETECTION.
We definitely can't change the exiting parser / object-mapper (as it'll break existing users) but just like customers have the option to pass a parsed JSONNode for events (and perform any validation on their side before making the call), wondering if its reasonable to allow customers to pass JsonNode / JsonParser to us for rules as well.
It'll mean adding more addRule(..) / deleteRule(...) interfaces in GenericMachine.java and then update the RulerCompiler.java / JsonRuleCompiler.java to take in the parser as well.
Along with it, I like the idea of adding new methods that enforce stricting cheking by default but I think we'll need a different name than addRuleSafely(...). The current methods are still safe, they just have different behavior when it comes to JSON parsing.
thinking more on this, one more option we have is make GenericMachine configurable during construction and allow whichever JSON parsing preferences customers would want via the constructor. We can then tweak our internal mappers / parsers based on their preferences. It'll mean a new constructor, but no new addRule / deleteRule for folks to migrate to (and fewer methods for us to manage keep around).
So, have a MachineFactory or MachineBuilder class?
Yes. I need to check which of the two will be easier for customers to adopt. I suspecting MachineFactory but MachineBuilder makes is easier to introduce more configurations in the future.