[Bug] Package `v8.16.2` contains new rule versions without updates
Related to: https://github.com/elastic/kibana/pull/201825#pullrequestreview-2470373688
Summary
In Kibana, I upgraded the package with prebuilt rules from v8.16.2-beta.1 to v8.16.2 and got 64 detection rules that can be upgraded via the Security Solution UI. Out of those 64 rules, there were 26 rules that had only the version field bumped from something like 2 or 3 to something like 100+, and there were no other changes to rule fields in these newer versions of rules.
Here's an example of how such rules are displayed in the rule upgrade UI:
To Reproduce
On way to reproduce this when testing it with the local Kibana: https://github.com/elastic/kibana/pull/201825#pullrequestreview-2470373688
Expected Behavior
In the package with prebuilt rules, we shouldn't release a new rule version if it doesn't have any changes to any other rule parameters, compared to the previous version of the rule.
Problematic rules
Here's the full list of rules with updated version and no updates to any other fields:
- First Occurrence of Personal Access Token (PAT) Use For a GitHub User
- High Number of Cloned GitHub Repos From PAT
- First Occurrence of IP Address For GitHub Personal Access Token (PAT)
- GitHub App Deleted
- GitHub PAT Access Revoked
- Multiple Okta User Authentication Events with Client Address
- Multiple Okta User Authentication Events with Same Device Token Hash
- GitHub Owner Role Granted To User
- Okta User Sessions Started from Different Geolocations
- First Occurrence of IP Address For GitHub User
- GitHub User Blocked From Organization
- First Occurrence of User-Agent For a GitHub User
- Multiple Device Token Hashes for Single Okta Session
- First Occurrence GitHub Event for a Personal Access Token (PAT)
- First Occurrence of GitHub User Interaction with Private Repo
- First Occurrence of GitHub Repo Interaction From a New IP
- GitHub Protected Branch Settings Changed
- Member Removed From GitHub Organization
- First Occurrence of User Agent For a GitHub Personal Access Token (PAT)
- New GitHub App Installed
- First Occurrence of Private Repo Event from Specific GitHub Personal Access Token (PAT)
- High Number of Okta Device Token Cookies Generated for Authentication
- New GitHub Owner Added
- New User Added To GitHub Organization
- GitHub Repo Created
- GitHub Repository Deleted
These are min_stack version updates :slightly_smiling_face: Part of the PR https://github.com/elastic/detection-rules/pull/4273
With the changing requirements of the integration versions, we identify the kibana version to minstack for rules. Where no query contents are changed in this case.
Will be presenting this into todays Simplified Protections Management Theme Working Group meeting
Notable Debugging and Root Cause Identification
-
For the Rules listed above we have seen a major identifier that the related integration filed is not populated.
-
For example
GitHub Repository Deleted- No Related Integration Populated
- We believe this is what causing the version diff to not show related integrations filed changes
- No Related Integration Populated
-
On rules where we have the related integration filed populated we are not seeing this issue
-
For example GitHub Protected Branch Settings Changed
- Related Integration Filed populated
- Version changes show related integration version bump
- Related Integration Filed populated
-
For these rules analysed The related integration is sporadically missing from the kibana asset.
- Rule
GitHub Repository Deleteddoes not have related integrations in kibana asset- https://github.com/elastic/integrations/blob/14991d9108f8e522ffbda85620900282d6a558bf/packages/security_detection_engine/kibana/security_rule/345889c4-23a8-4bc0-b7ca-756bd17ce83b_2.json#L13
- https://github.com/elastic/integrations/blob/14991d9108f8e522ffbda85620900282d6a558bf/packages/security_detection_engine/kibana/security_rule/345889c4-23a8-4bc0-b7ca-756bd17ce83b_102.json#L13
- Rule
-
Rule
GitHub Protected Branch Settings Changedhas the related integrations in kibana asset- https://github.com/elastic/integrations/blob/14991d9108f8e522ffbda85620900282d6a558bf/packages/security_detection_engine/kibana/security_rule/07639887-da3a-4fbf-9532-8ce748ff8c50_4.json#L15
- https://github.com/elastic/integrations/blob/14991d9108f8e522ffbda85620900282d6a558bf/packages/security_detection_engine/kibana/security_rule/07639887-da3a-4fbf-9532-8ce748ff8c50_105.json#L15
-
The nature of this corrupted state is still unclear! because its the same code that generate assets for all rule types in the version!
-
The reason I am stating that its very sporadic in nature is because another rule termed problematic in the above bug
GitHub App Deletedis showing up rightly in my update diff- The asset also has related integration field populated https://github.com/elastic/integrations/blob/14991d9108f8e522ffbda85620900282d6a558bf/packages/security_detection_engine/kibana/security_rule/fd01b949-81be-46d5-bcf8-284395d5f56d_2.json#L16
cc @Mikaayenson @approksiu @banderror
Spot Check on My Test Stack
- Works as Expected Rules --> Meaning the Version Diff shows related integration Field, Kibana Asset has the related integration filed populated
| Rule Name | Kibana Asset with related integrations |
|---|---|
| First Occurrence of Personal Access Token (PAT) Use For a GitHub User | Asset |
| High Number of Cloned GitHub Repos From PAT | Asset |
| First Occurrence of IP Address For GitHub Personal Access Token (PAT) | Asset |
| GitHub App Deleted | Asset |
| GitHub PAT Access Revoked | Asset |
| GitHub Owner Role Granted To User | Asset |
| First Occurrence of IP Address For GitHub User | Asset |
| GitHub User Blocked From Organization | Asset |
| First Occurrence of User-Agent For a GitHub User | Asset |
| First Occurrence GitHub Event for a Personal Access Token (PAT) | Asset |
| First Occurrence of GitHub User Interaction with Private Repo | Asset |
| First Occurrence of GitHub Repo Interaction From a New IP | Asset |
| GitHub Protected Branch Settings Changed | Asset |
| Member Removed From GitHub Organization | Asset |
| First Occurrence of User Agent For a GitHub Personal Access Token (PAT) | Asset |
| New GitHub App Installed | Asset |
| First Occurrence of Private Repo Event from Specific GitHub Personal Access Token (PAT) | Asset |
| New GitHub Owner Added | Asset |
| New User Added To GitHub Organization | Asset |
| GitHub Repo Created | Asset |
- Bug --> As described above we dont have the related integration field populated.
| Rule Name | Kibana Asset without related integrations |
|---|---|
| Multiple Okta User Authentication Events with Client Address | Asset |
| Multiple Okta User Authentication Events with Same Device Token Hash | Asset |
| Okta User Sessions Started from Different Geolocations | Asset |
| Multiple Device Token Hashes for Single Okta Session | Asset |
| High Number of Okta Device Token Cookies Generated for Authentication | Asset |
| GitHub Repository Deleted | Asset |
Also out of the 26 rules reported by @banderror , I was able to reproduce only 6 of them :), the rest of them seem to render the right set of information for me cc @Mikaayenson
I see the same behavior as @banderror. @shashank-elastic I will share the instance where you can check. I see both issues - missing integration, and no changes in any rule fields. The rendering depends on the rule version you currently have, since you skipped some updates - there are more changes accumulated that you can view in diff.
@banderror @approksiu There are two issues.
- Some rules do not have event.dataset when they should like
GitHub Repository Deleted. This can be easily fixed. - Some rules are ES|QL. Since we do not have a parser for this language the build time fields are not populated. This includes
related_integrationsandrequired_fields.
@Mikaayenson thanks! Are there plans to fix ES|QL rules build time field issues?
@Mikaayenson Yes thats true I was able to drill dow to the source of the First Version of the File that was added in release https://github.com/elastic/ia-trade-team/issues/388
The very first version of the rule, https://github.com/elastic/integrations/pull/10239/files#diff-0208dd40dfe6b85cb7f8bdba6bcd7305d84f9303bbf86514c8a0f9624157a8de does not have the related integrations_field.
To confirm the same here is another ESQL rule sample AWS STS Role Chaining that does not populate related integrations in Kibana
Have created a PR to fix even.dataset -https://github.com/elastic/detection-rules/pull/4278 cc @Mikaayenson @approksiu
@approksiu From your test stack I was able to reproduce the issue on the UI.
But from the asset perspective all the necessary required fields for related_integrations are rightly populated in non ES|QL supported rules.
Looking at the custom package that we created to test smart limits the only major difference we see is the json format
https://github.com/elastic/integrations/pull/11866/files#diff-49be033ff70d5dd333c4de4affe46c089228d1a26ec28fb57b4d8e0473f27724.
But all of the required information is rightly populated.
If we move to the latest version 8.16.2-beta.2 the file format is rectified https://github.com/elastic/integrations/pull/11900/files#diff-c0d42a1b673f8da2e46f126338934d5466bd7c9eb1f740f02870e4d3b2c5f238
But its unclear weather this is causing the issue, and since am not upgrading from a smart limit package, I may not be seeing this issue as in your stack or @banderror stack
From @approksiu Test Stack
From @shashank-elastic Test Stack
Also please not am moving from version 1--> 103 and the other stack is moving from 2 --> 103. One possible way to look at this is we are trying to diff from v8.16.2-beta.1 which was a custom package only to test smart limits, released ad-hoc.
So a conclusive debugging would be
- ES|QL Rule types don't have related_integrations filed populated, which makes the update diff only show version bumps and not actual changes
- Rule without event.dataset is fixed in PR https://github.com/elastic/detection-rules/pull/4278
- For other EQL rule types having empty diff based on the version bump movement we are seeing, it is likely that the previous version of the rule was from a custom built package to test only smart limits and we should be ideally looking at 8.16.1 --> 8.16.2 rule update diffs ( This is going by the description of the bug that we see this when we upgrade v8.16.2-beta.1 to v8.16.2)
Having said that, v8.16.2-beta.1 was temporary only a beta package to test smart limits and we have override that with the latest GA package so customers seeing this would be highly unlikely!
cc @Mikaayenson @banderror
@Mikaayenson thanks! Are there plans to fix ES|QL rules build time field issues?
@approksiu We do not have plans to fix ES|QL build time fields. Going back to Mar 2023 we were recommended by the ESQL team to build a python parser in-house. We do not have the capacity to do this, and any ad-hoc approach will result in bugs/tech debt.
We still do need this parser if we want to ship the build time fields (related integrations, required fields, etc) in a scalable and reliable way.
One alternative would be to parse ESQL rules in kibana since it's already available and extract the build time fields. In a previous POC I explored in detail the LOE to develop and maintain a parser and it quickly became evident that we didn't have to the capacity to develop/maintain this. We need help / resources for this gap.
One stop gap for the related integrations field would be:
- Adding the related integrations content directly to the toml file. This is not scalable, very much error prone, and would require DEs to know the specific version to add, and will not be automatically updated when min stacks occur. Even though we can do this out of the box with no code changes, I do not recommend this for the reasons just mentioned.
- With a line or two of code changes, we could attempt to repurpose the
integrationsfield within the toml file. This information is duplicative since the event dataset may also be in the query. This approach would dynamically calculate the version. However it's abusing the availableintegrationfield and extended the definition of how it should be used (which potentially creates opaque issues and tech debt), making it harder to maintain long term.
[!IMPORTANT] For any chances to our detection engineering workflows, we ( @elastic/threat-research-and-detection-engineering ) will need to discuss internally to minimize impact. Also, minor note, these two ideas will still not address the required fields issue.
Thanks for looking into it @Mikaayenson @shashank-elastic @approksiu! A few comments on the above:
Also please not am moving from version 1--> 103 and the other stack is moving from 2 --> 103. One possible way to look at this is we are trying to diff from v8.16.2-beta.1 which was a custom package only to test smart limits, released ad-hoc.
I think it would be more correct to say that we diff rule versions between each other, not packages. Let's break this example down.
If a new 2 version was added in the prerelease v8.16.2-beta.1 package, my expectation was that versions 2 and 103 would also appear in the subsequent v8.16.2 and following release packages. This wouldn't affect any real users because no user would be able to install version 2 from v8.16.2-beta.1 => no user would be able to upgrade from 2 to 103.
However, the fact that these versions 2 and 103 were added to the package by the release process, and the diff between them is empty (except the version field itself), looks suspicious as if there is/are bugs in the release process.
If we abstract away from prerelease and release package, I think there would be a few rules of thumb:
- In Kibana, for simplicity we accept that any user can be upgrading a rule from any
Xversion to anyYversion, whereX < Y. - In Kibana, we don't control what
XandYcan be. - No
XtoYupgrade should yield an empty diff, whereYis the next version afterX. In other words, any bump of theversionfield should be accompanied by some changes to other rule fields (parameters).
For other EQL rule types having empty diff based on the version bump movement we are seeing, it is likely that the previous version of the rule was from a custom built package to test only smart limits
This also sounds a bit suspicious. Are you saying that we added some fake rule versions to the v8.16.2-beta.1 package? My expectation was that v8.16.2-beta.1 would be equal to v8.16.2 if we wanted to include historical versions to v8.16.2. Which means that the issue is not with the v8.16.2-beta.1 package itself, but with the package build and release process that potentially contains some bugs.
ES|QL Rule types don't have related_integrations filed populated, which makes the update diff only show version bumps and not actual changes We do not have plans to fix ES|QL build time fields One alternative would be to parse ESQL rules in kibana...
The concern here is not about having or not having related_integrations and required_fields specified for prebuilt ES|QL rules. We're concerned about bumping their versions when their fields are not getting any updates in the detection-rules repo.
I think the latter should be fixed on the detection-rules repo side. As far as I understand, when we bump an integration's version, we update the corresponding prebuilt rules that indirectly depend on this integration via rule query, and increment their versions. So, if we can't parse ES|QL queries yet, we shouldn't be incrementing ES|QL rules versions as part of this process.
Rule without event.dataset is fixed in PR https://github.com/elastic/detection-rules/pull/4278
Great, thanks!
I'm going to re-test the upgrade workflow with the new v8.17.1 package and some older packages. Will post here when I have an update. Thanks!
So this sounds a bit suspicious. Are you saying that we added some fake rule versions to the
v8.16.2-beta.1package? My expectation was thatv8.16.2-beta.1would be equal tov8.16.2if we wanted to include historical versions tov8.16.2. Which means that the issue is not with thev8.16.2-beta.1package itself, but with the package build and release process that potentially contains some bugs.
@banderror we never introduced any fake rule versions, we followed the beta release package process like we do for our rule set and add historical versions of the rule, like mentioned in https://github.com/elastic/detection-rules/issues/4150#issuecomment-2498132515.
We did not run any new lock versions, but built from the latest rule versions a custom package v8.16.2-beta.1 to facilitate testing at that moment. I quickly wanted to clarify that there were no fake versions manually introduced. But I will also wait for the test results with package v8.17.1 which was released yesterday!
@shashank-elastic Thank you, got it. I also just updated my comment above because I noticed a few mistakes from my side. Corrected.
@banderror / @xcrzx is this that is something that is (or will be) actively worked, or can we close the issue?
@shashank-elastic Yes, it will be. We should re-test the issue and see how many rule updates like that we can actually get on a real package. We haven't had a chance to work on this yet.
@nikitaindik Any updates on testing this one?
@banderror Yes. I made a script to test all non-beta versions of the package to find which package versions are problematic. The script basically installs a certain package version, installs all rules from this version, then installs the immediate next package version and calls our upgrade review endpoint to check which rules have version only updates. I ran the script from Kibana on the latest main (9.2).
Tested on these package versions (click to unfold)
8.17.0 8.17.1 8.17.2 8.17.3 8.17.4 8.17.5 8.17.6 8.17.7 8.17.8 8.17.9 8.17.10 8.17.11 8.17.12 8.17.13 8.17.14 8.18.1 8.18.2 8.18.3 8.18.4 8.18.5 8.18.6 8.18.7 9.0.1 9.0.2 9.0.3 9.0.4 9.0.5 9.0.6 9.0.7
I have found that these upgrades have rules with version-only changes:
- from
8.17.7to8.17.8(1194 rules with version-only changes – 8.17.7-8.17.8.csv) - from
8.17.9to8.17.10(1 rule with version-only changes – 8.17.9-8.17.10.csv) - from
8.18.2to8.18.3(1 rule with version-only changes – 8.18.2-8.18.3.csv) - from
9.0.1to9.0.2(1 rule with version-only changes – 9.0.1-9.0.2.csv) - from
9.0.2to9.0.3(1297 rules with version-only changes – 9.0.2-9.0.3.csv)
Attached the CSV files with rule_ids, names, version changes and types of the rules with version-only changes.
Most of the rules I found are non-ESQL. Some rules, like this one seem to have only have changes in updated_date and min_stack_version. These fields do not end up in "alert" SOs our end, so for us these fields don't exist.
@Mikaayenson @shashank-elastic Could you please review the CSVs and investigate why these rules might have version-only updates?
UPD: Just noticed that I skipped 8.18.x versions. Edited the comment to add them.
@approksiu According to Nikita's update above, it looks like at this moment the issue shouldn't affect any users who will be upgrading the package with prebuilt rules to its latest version. However, the cause of the issue is still there, and we might face it again in the future versions of the package. JFYI.
Thanks @nikitaindik!
Hey @nikitaindik While I have gone through the CSV provided for analysis, these were inherently caused by how we used to increment versions for integration changes across rule types. This was fixed and released as part of the PR, where a different hashing mechanism was introduced to handle version increments. This is already implemented and released in package versions 9.0.4 and above. In the latest versions and future versions we should not see this issue.
cc @Mikaayenson
@approksiu
The issue appears to be resolved on the TRADE side starting from package v9.0.4. However, some users may still be on an older package where this issue persists.
Example scenario:
- A user installs a rule while on package v9.0.1.
- This rule receives a version-only update in v9.0.2, which the user does not install.
- Newer packages are released over time (v9.0.3, v9.0.4, etc.), but they do not include an update for this specific rule.
- The user continues to see the issue even if they are on the latest package until they either accept a version-only update or TRADE releases a "real" update for this rule.
Do we want to address such cases by adding a message to the flyout when we detect version-only changes? Perhaps we can encourage users to upgrade, even if there are no useful changes. Can you suggest a suitable message?
Another option would be to wait until all problematic rules receive a "real" upgrade. However, in the meantime, we may get questions from users about these empty updates.
Let's not add to the flyout for this special case, and answer the questions if they come. All these rules can be updated in bulk, and the issue is resolved moving on.
Ok, thanks @approksiu @nikitaindik and @shashank-elastic. I'm closing the ticket.