trivy feat: Add the ability to specify schema input

feat: Add the ability to specify schema input

Open simar7 opened this issue 1 year ago • 4 comments

User Story

As a: policy writer/contributor (hereon referred to as policy contributor), I would like to: specify the schema of my input to Rego policies, So that: I can get better feedback from the Rego engine regarding any errors in my policy upfront.

What exists today

Today schema is hardcoded to be schema.input which is neither helpful to the rego policy engine nor correct rego. As an example can be found here https://github.com/aquasecurity/defsec/blob/master/internal/rules/policies/cloud/policies/aws/rds/disable_public_access.rego#L5-L6
As a result of such a hardcoded schema, the policy contributor is unable to add a custom schema to each policy. Adding schema helps the rego engine evaluate if the policy if syntactically correct or otherwise prior to evaluation. More details on how to use a schema input can be found here: https://www.openpolicyagent.org/docs/latest/schemas/

What is desired

The policy contributor should be able to define a schema via the schema annotations. An example of this could look like the following:

# METADATA
# title: "RDS Publicly Accessible"
# description: "Ensures RDS instances are not launched into the public cloud."
# scope: package
# schemas:
# - input: schema["cloud"]

Notice the input field is populated with an absolute path to the schema file, in this case the schema for AWS Cloud.

Another example for a Dockerfile could entail as follows:

# METADATA
# title: ADD instead of COPY
# description: You should use COPY instead of ADD unless you want to extract a tar file.
# scope: package
# related_resources:
# - https://docs.docker.com/engine/reference/builder/#add
# schemas:
# - input: schema["dockerfile"]

Where dockerfile.json could be such an input: https://github.com/aquasecurity/defsec/blob/master/pkg/rego/schemas/dockerfile.json

Currently this functionality is handled internally (with input automatically being detected based on the scanner being run).

The goal is to be able to specify this within the policy so that policy writers can define their own schema, if and when it varies from the one that will be assigned to it internally.

Acceptance Criteria

As a rego policy contributor, I should be able to specify the schema to the input as part of the policy.
The input schema must conform to the OPA specifications as listed here: https://www.openpolicyagent.org/docs/latest/schemas/#schema-annotations
A user might not wish to specify this (Or specify the existing schema.input), in such a case the existing functionality will remain (no improvement or breakage)

Signed-off-by: Simar [email protected]

Jan 17 '23 00:01 simar7

is this a work item though? looks like the acceptance criteria you defined is already met.

I don't think there's an issue with allowing authors to specify file schema, the issue is that 1) we don't use this in our policies. our policies (schemas) look alien to "normal" people, and to tooling 2) if author want to use a file reference, it isn't clear what is our guidelines around it (for policies in defsec): absolute like you mentioned is not usable in this case. remote file is ok? do we care where it is? do we care for different policies with same input use the same schema? do we require certain versioning?

Jan 17 '23 07:01 itaysk

Dumping my personal notes here on the code flow (just for reference):

rego rules declare schema of: input.schema . This is a "virtual name" that automatically contains the "correct" schema per scanner type.

init

there are a handful of pre-defined input document's schemas under pkg/rego/schemas.
they are embedded at build time into go vars.
on rego scanner init, it checks the source type (what are we scanning). based on the source, we select the appropriate scehma (or default to scehma.Anything). https://github.com/aquasecurity/defsec/blob/4f3187c78d6b00afbfc741c7d8c46c663774fc25/pkg/rego/scanner.go#L130 (schema is unmarshaled into interface{}, not kept as string).

run query

load the selected, parsed, json schema into rego under the ref schema.input https://github.com/aquasecurity/defsec/blob/4f3187c78d6b00afbfc741c7d8c46c663774fc25/pkg/rego/scanner.go#L178
set it as option on rego engine (rego.Schemas(schemaSet)).

There's also schemas.None which is supposed to be invalid schema https://github.com/aquasecurity/defsec/blob/4f3187c78d6b00afbfc741c7d8c46c663774fc25/pkg/rego/scanner.go#L147

Jan 17 '23 08:01 itaysk

is this a work item though? looks like the acceptance criteria you defined is already met.

I don't think there's an issue with allowing authors to specify file schema, the issue is that 1) we don't use this in our policies. our policies (schemas) look alien to "normal" people, and to tooling 2) if author want to use a file reference, it isn't clear what is our guidelines around it (for policies in defsec): absolute like you mentioned is not usable in this case. remote file is ok? do we care where it is? do we care for different policies with same input use the same schema? do we require certain versioning?

Good feedback, that's why I created this issue to iron out the details.

Just sharing my observations below:

So from my understanding, something like this should be supported:

# METADATA
# schemas:
# - input: schema["dockerfile"]
package defsec.test

deny {
    input.evil == "lol"
}

Should work out of the box, if a policy writer decided to write this. This would guarantee that the Dockerfile schema is used to compile/evaluate this policy. Which when run should return something as follows:

1 error occurred: testpolicy.rego:8: rego_type_error: undefined ref: input.evil
        input.evil
              ^
              have: "evil"
              want (one of): ["Stages"]

which is standard Rego/OPA behaviour. This document is a good read to understand the OPA Schema Annotation conventions: https://www.openpolicyagent.org/docs/latest/schemas/#schema-annotations

As for the syntax of the schema annotation itself, we should only support what is currently supported by OPA today. The above document also highlights that.

There are some open questions I have regarding schemas that we support out of the box and provide schema files for (dockerfile, aws, k8s etc.) vs. custom schemas from the user. I'm working on a PoC to evaluate them.

Jan 18 '23 01:01 simar7

Good to see #1143 ! I didn't catch it in time, but have a few followup questions/comments I was hoping you can answer:

document available (embedded) schemas
document the input json structure in human readable
can users bring their schemas for custom rules? (probably out of scope)
use subpackage scope for schemas instead of duplications
k8s schemas are too generic, consider using specific ones. this will be related to selector discussion
how/when do we update schemas? should they be versioned?

Jan 28 '23 18:01 itaysk

This issue is stale because it has been labeled with inactivity.

Mar 31 '23 00:03 github-actions[bot]

@simar7 should this issue be closed or not?

Mar 31 '23 11:03 itaysk

@simar7 ping

Apr 28 '23 17:04 itaysk

Yes sorry it should be closed. I opened one for improving it further.

Apr 28 '23 18:04 simar7

trivy trivy copied to clipboard

feat: Add the ability to specify schema input

User Story

What exists today

What is desired

Acceptance Criteria

init

run query

trivy
trivy copied to clipboard