Secure Excecution of Groovy Code
Description
Adds security and safety checks for Groovy scripts by performing static analysis on the AST. When enabled this feature will check every Groovy script sent to Pinot against an Allowed list of operations and classes. If a script uses anything which is not in the Allowed list then it will be rejected by Pinot.
The allowed constructs can be managed via Cluster Config APIs. The administrator can define allow lists for: imports, static imports,
receivers, and set a list of disallowed method names. The imports lists define what constructs can be imported and used within a
Groovy script, this includes referencing a construct through it's full canonical name (e.g. System.out.println("hello")). The disallowed
methods list is provided to block calls to methods like invoke and execute which would allow a user to execute some malicious operations
such as shell commands.
The static analysis is performed by the org.codehaus.groovy.control.customizers.SecureASTCustomizer, which traverses the AST of a
Groovy script and applies configuration validation rules to make sure a Groovy script satisfies the security restrictions.
The configuration for the analyzer is stored in the Cluster Config under the pinot.server.query.groovy.analyzer.static field:
{
"id": "pinot",
"simpleFields": {
"allowParticipantAutoJoin": "true",
"default.hyperloglog.log2m": "8",
"enable.case.insensitive": "true",
"pinot.broker.enable.query.limit.override": "false",
"pinot.broker.disable.query.groovy": "false",
"pinot.server.query.groovy.analyzer.static": "{\"allowedReceivers\":[\"java.lang.String\",\"java.lang.Math\"],\"allowedImports\":[\"java.lang.Math\"],\"allowedStaticImports\":[\"java.lang.Math\"],\"disallowedMethodNames\":[\"execute\",\"invoke\"]}"
},
"mapFields": {},
"listFields": {}
}
To enable the Analyzer, add a configuration to the pinot.server.query.groovy.analyzer.static field. If this field has a value then Pinot will
enable the static analyzer. This property can be easily configured by POSTing to the /cluster/groovy/analyzer/static/config API, with a
body like the following example:
// Example Static Analyzer Config. This will allow basic arithmetic operations along with constructs within Math and String to be used.
// List operations, for example, would be blocked.
{
"allowedReceivers": [
"java.lang.String",
"java.lang.Math"
],
"allowedImports": [
"java.lang.String",
"java.lang.Math"
],
"allowedStaticImports": [
"java.lang.String",
"java.lang.Math"
],
"disallowedMethodNames": [
"execute",
"invoke"
]
}
The current config can be retrieved by GETting /cluster/groovy/analyzer/static/config. A recommended default configuration can be gotten
by GETting /cluster/groovy/analyzer/static/default.
Testing
Unit tests were added that check the analyzer with legal scripts, illegal scripts, and changing the configuration. Unit tests were also added to validate the serialization and deserialization for configurations to and from JSON.
A Pinot image with this feature was deployed to a test cluster and verified that when enabled Groovy scripts with illegal operations were blocked and legal operations were allowed, and, when disabled, all Groovy scripts were allowed. In addition, and independent engineer went through and tested the analyzer against a set of malicious scripts they had compiled.
Codecov Report
Attention: Patch coverage is 51.20000% with 61 lines in your changes missing coverage. Please review.
Project coverage is 63.79%. Comparing base (
59551e4) to head (e026c21). Report is 1186 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #14197 +/- ##
============================================
+ Coverage 61.75% 63.79% +2.04%
- Complexity 207 1535 +1328
============================================
Files 2436 2624 +188
Lines 133233 144560 +11327
Branches 20636 22109 +1473
============================================
+ Hits 82274 92224 +9950
- Misses 44911 45541 +630
- Partials 6048 6795 +747
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration1 | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration2 | 0.00% <ø> (ø) |
|
| java-11 | 63.74% <51.20%> (+2.03%) |
:arrow_up: |
| java-21 | 63.68% <51.20%> (+2.06%) |
:arrow_up: |
| skip-bytebuffers-false | 63.78% <51.20%> (+2.04%) |
:arrow_up: |
| skip-bytebuffers-true | 63.64% <51.20%> (+35.92%) |
:arrow_up: |
| temurin | 63.79% <51.20%> (+2.04%) |
:arrow_up: |
| unittests | 63.79% <51.20%> (+2.04%) |
:arrow_up: |
| unittests1 | 55.54% <73.56%> (+8.65%) |
:arrow_up: |
| unittests2 | 34.29% <8.80%> (+6.56%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I find this post very similar to what we wanted: https://melix.github.io/blog/2011/05/12/customizing_groovy_compilation_process.html Are you following the steps in this post?
I find this post very similar to what we wanted: https://melix.github.io/blog/2011/05/12/customizing_groovy_compilation_process.html Are you following the steps in this post?
Pretty much, I believe this guy wrote the SecureASTCustomizer I used and was one of the first documents about securing Groovy that I found.
Not quite sure what is happening with the failing tests, looks like it's some CLP tests that are failing.
pinot.server.query.groovy.analyzer.static
this covers both ingestion time and query time groovy i assume? having it start with "pinot.server.query" would be confusing
/cluster/groovy/analyzer/static/configAPI,
Is this the right convention to use for such an API? Only until cluster/config/groovy are we doing a hierarchy, so perhaps something like cluster/configs/groovystaticConfig is a more apt name?
Good catch. Will fix.
On Sat, Oct 12, 2024 at 8:36 PM Neha Pawar @.***> wrote:
pinot.server.query.groovy.analyzer.static
this covers both ingestion time and query time groovy i assume? having it start with "pinot.server.query" would be confusing
— Reply to this email directly, view it on GitHub https://github.com/apache/pinot/pull/14197#issuecomment-2408763744, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAASDJ7SYAMR7XZOTGADAZTZ3G57LAVCNFSM6AAAAABPVKS7F2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBYG43DGNZUGQ . You are receiving this because you authored the thread.Message ID: @.***>
/cluster/groovy/analyzer/static/configAPI,Is this the right convention to use for such an API? Only until cluster/config/groovy are we doing a hierarchy, so perhaps something like cluster/configs/groovystaticConfig is a more apt name?
I initially had that, but then I thought we might add additional Groovy security features (such as executing in a sandbox) in the future, and grouping Groovy configs under cluster/configs/groovy/ would be more future proof.
perhaps /cluster/configs/groovy/staticConfig would be better.
Close this one and prefer #14844