trivy
trivy copied to clipboard
feat: Add the ability to specify schema input
User Story
As a: policy writer/contributor (hereon referred to as policy contributor), I would like to: specify the schema of my input to Rego policies, So that: I can get better feedback from the Rego engine regarding any errors in my policy upfront.
What exists today
- Today schema is hardcoded to be
schema.input
which is neither helpful to the rego policy engine nor correct rego. As an example can be found here https://github.com/aquasecurity/defsec/blob/master/internal/rules/policies/cloud/policies/aws/rds/disable_public_access.rego#L5-L6 - As a result of such a hardcoded schema, the policy contributor is unable to add a custom schema to each policy. Adding schema helps the rego engine evaluate if the policy if syntactically correct or otherwise prior to evaluation. More details on how to use a schema input can be found here: https://www.openpolicyagent.org/docs/latest/schemas/
What is desired
The policy contributor should be able to define a schema via the schema annotations. An example of this could look like the following:
# METADATA
# title: "RDS Publicly Accessible"
# description: "Ensures RDS instances are not launched into the public cloud."
# scope: package
# schemas:
# - input: schema["cloud"]
Notice the input field is populated with an absolute path to the schema file, in this case the schema for AWS Cloud.
Another example for a Dockerfile could entail as follows:
# METADATA
# title: ADD instead of COPY
# description: You should use COPY instead of ADD unless you want to extract a tar file.
# scope: package
# related_resources:
# - https://docs.docker.com/engine/reference/builder/#add
# schemas:
# - input: schema["dockerfile"]
Where dockerfile.json
could be such an input: https://github.com/aquasecurity/defsec/blob/master/pkg/rego/schemas/dockerfile.json
Currently this functionality is handled internally (with input automatically being detected based on the scanner being run).
The goal is to be able to specify this within the policy so that policy writers can define their own schema, if and when it varies from the one that will be assigned to it internally.
Acceptance Criteria
- As a rego policy contributor, I should be able to specify the schema to the input as part of the policy.
- The input schema must conform to the OPA specifications as listed here: https://www.openpolicyagent.org/docs/latest/schemas/#schema-annotations
- A user might not wish to specify this (Or specify the existing
schema.input
), in such a case the existing functionality will remain (no improvement or breakage)
Signed-off-by: Simar [email protected]
is this a work item though? looks like the acceptance criteria you defined is already met.
I don't think there's an issue with allowing authors to specify file schema, the issue is that 1) we don't use this in our policies. our policies (schemas) look alien to "normal" people, and to tooling 2) if author want to use a file reference, it isn't clear what is our guidelines around it (for policies in defsec): absolute like you mentioned is not usable in this case. remote file is ok? do we care where it is? do we care for different policies with same input use the same schema? do we require certain versioning?
Dumping my personal notes here on the code flow (just for reference):
rego rules declare schema of: input.schema
. This is a "virtual name" that automatically contains the "correct" schema per scanner
type.
init
- there are a handful of pre-defined input document's schemas under
pkg/rego/schemas
. - they are embedded at build time into go vars.
- on rego scanner init, it checks the
source
type (what are we scanning). based on the source, we select the appropriate scehma (or default toscehma.Anything
). https://github.com/aquasecurity/defsec/blob/4f3187c78d6b00afbfc741c7d8c46c663774fc25/pkg/rego/scanner.go#L130 (schema is unmarshaled intointerface{}
, not kept as string).
run query
- load the selected, parsed, json schema into rego under the ref
schema.input
https://github.com/aquasecurity/defsec/blob/4f3187c78d6b00afbfc741c7d8c46c663774fc25/pkg/rego/scanner.go#L178 - set it as
option
on rego engine (rego.Schemas(schemaSet)
).
There's also schemas.None
which is supposed to be invalid schema
https://github.com/aquasecurity/defsec/blob/4f3187c78d6b00afbfc741c7d8c46c663774fc25/pkg/rego/scanner.go#L147
is this a work item though? looks like the acceptance criteria you defined is already met.
I don't think there's an issue with allowing authors to specify file schema, the issue is that 1) we don't use this in our policies. our policies (schemas) look alien to "normal" people, and to tooling 2) if author want to use a file reference, it isn't clear what is our guidelines around it (for policies in defsec): absolute like you mentioned is not usable in this case. remote file is ok? do we care where it is? do we care for different policies with same input use the same schema? do we require certain versioning?
Good feedback, that's why I created this issue to iron out the details.
Just sharing my observations below:
So from my understanding, something like this should be supported:
# METADATA
# schemas:
# - input: schema["dockerfile"]
package defsec.test
deny {
input.evil == "lol"
}
Should work out of the box, if a policy writer decided to write this. This would guarantee that the Dockerfile
schema is used to compile/evaluate this policy. Which when run should return something as follows:
1 error occurred: testpolicy.rego:8: rego_type_error: undefined ref: input.evil
input.evil
^
have: "evil"
want (one of): ["Stages"]
which is standard Rego/OPA behaviour. This document is a good read to understand the OPA Schema Annotation conventions: https://www.openpolicyagent.org/docs/latest/schemas/#schema-annotations
As for the syntax of the schema annotation itself, we should only support what is currently supported by OPA today. The above document also highlights that.
There are some open questions I have regarding schemas that we support out of the box and provide schema files for (dockerfile, aws, k8s etc.) vs. custom schemas from the user. I'm working on a PoC to evaluate them.
Good to see #1143 ! I didn't catch it in time, but have a few followup questions/comments I was hoping you can answer:
- document available (embedded) schemas
- document the input json structure in human readable
- can users bring their schemas for custom rules? (probably out of scope)
- use subpackage scope for schemas instead of duplications
- k8s schemas are too generic, consider using specific ones. this will be related to selector discussion
- how/when do we update schemas? should they be versioned?
This issue is stale because it has been labeled with inactivity.
@simar7 should this issue be closed or not?
@simar7 ping
Yes sorry it should be closed. I opened one for improving it further.