libs
libs copied to clipboard
[Feature] Supporting field transformers in filtering language
Action plan
We plan to execute the changes as follows:
- [ ] Preparing ground work on libsinsp "filter checks" data structure, that evaluates filter comparisons at runtime
- https://github.com/falcosecurity/libs/pull/1787
- https://github.com/falcosecurity/libs/pull/1792
- https://github.com/falcosecurity/libs/pull/1793
- https://github.com/falcosecurity/libs/pull/1795
- [ ] Updating the filter grammar and AST (Abstract Syntax) definitions
- [ ] Supporting the new feature in the "sinsp filter compiler", which compiles filter ASTs in the filtercheck-based executable form avaluated at runtime
- [ ] Supporting the new feature in the sinsp output formatters, which are used to format Falco rules output and print-out information about event payloads and data fields
- [ ] Document all the features on falco.org
All steps will require in-depth tests.
Motivation
Over time, we collected plenty of requests in the context of the filtering language of libsinsp. This little DSL (domain specific language) is the basis on which Falco rules are developed and executed, and also serves other use cases across the different adopters of the Falco libs. Feedback from adopters always indicated that the language is simple and expressive, but we acknowledge that it also suffers from some limitations. To list some:
- Small modifications to existing fields mandate adding new fields
- Minor changes in the semantics of a comparison operator mandate adding new operators (e.g.
istartswith
andiglob
, which are just case insensitive versions of already-existing operators) - Field-to-field comparisons are not possible
- Interpolation, composition, or small runtime transformations of existing types are not possible
Here's a non-comprehensive collection of issues from our repositories related to the topic:
- https://github.com/falcosecurity/libs/pull/1547
- https://github.com/falcosecurity/libs/issues/1627
- https://github.com/falcosecurity/falco/issues/2612
- https://github.com/falcosecurity/falco/issues/2496
- https://github.com/falcosecurity/falco/issues/2484
- https://github.com/falcosecurity/falco/issues/2403
The general feeling is that changing the nature of the language, or making it extra complex, would defeat the simplicy principles that made the rules language widely adopted and easy to learn. Moreover, the grammar of the filtering language is quite fragile and does not leave much space to edits without the risk of introducing breaking changes of vast magnitude.
However, we also argue that there are minor feasible changes could make the language far more expressive and powerful.
Feature
I want to share an R&D project that me and @Andreagit97 spent some time on over the past weeks.
Our proposal is to update the filtering language with the notion of Field transformers. Transformers are declarative transformations that can be applied to filter fields (e.g. proc.name
, etc...) with the purposes of supporting new detection scenarios and filtering capabilities.
The proposed syntax is as follows (all fields and scenarios are random simple examples):
-
fd.name startswith "/etc"
: Traditional use case, which will be supported as usual -
tolower(fd.name) startswith "/etc"
: Lower case conversion for string field types -
toupper(fd.name) startswith "/ETC"
: Upper case conversion for string field types -
b64(evt.buffer) bcontains deadbeef
: base64 decoding for string and bytebuf field types -
proc.name != val(proc.pname)
: field-to-field comparisons -
tolower(proc.name) != tolower(proc.pname)
: field-to-field comparisons, with transformers -
toupper(b64(fd.name)) = TESTFILE
: base64 decoding for string and bytebuf field types
Here are some properties of field transformers:
- Implemented as an additional feature of the language, thus not introducing any breaking change from to the current state of things
- Have strong typing, thus non-ambiguous
- Easy to implement new ones for future use cases, making them future proof
- Are composable (e.g.
toupper(b64(fd.name))
)
The grammar of the filtering language (current state: https://github.com/falcosecurity/libs/blob/eadccc563aa42baea827b7782a2159033a52d765/userspace/libsinsp/filter/parser.h#L27) will evolve in the following backward-compatible and non-ambiguous way:
Productions (EBNF Syntax):
Expr ::= OrExpr
OrExpr ::= AndExpr ('or' OrExprTail)*
OrExprTail ::= ' ' AndExpr
| '(' Expr ')'
AndExpr ::= NotExpr ('and' AndExprTail)*
AndExprTail ::= ' ' NotExpr
| '(' Expr ')'
NotExpr ::= ('not ')* NotExprTail
NotExprTail ::= 'not(' Expr ')'
| Check
Check ::= Field Condition
| FieldTransformer Condition
| Identifier
| '(' Expr ')'
FieldTransformer ::= FieldTransformerType FieldTransformerTail
FieldTransformerTail ::= FieldTransformerArg ')'
FieldTransformerArg ::= FieldTransformer
| Field
FieldTransformerOrVal ::= FieldTransformer
| FieldTransformerVal Field ')'
Condition ::= UnaryOperator
| NumOperator (NumValue | FieldTransformerOrVal)
| StrOperator (StrValue | FieldTransformerOrVal)
| ListOperator (ListValue | FieldTransformerOrVal)
ListValue ::= '(' (StrValue (',' StrValue)*)* ')'
| Identifier
Field ::= FieldName('[' FieldArg ']')?
FieldArg ::= QuotedStr | FieldArgBareStr
NumValue ::= HexNumber | Number
StrValue ::= QuotedStr | BareStr
Supported Check Operators (EBNF Syntax):
UnaryOperator ::= 'exists'
NumOperator ::= '<=' | '<' | '>=' | '>'
StrOperator ::= '==' | '=' | '!='
| 'glob ' | 'iglob '
| 'contains ' | 'icontains ' | 'bcontains '
| 'startswith ' | 'bstartswith ' | 'endswith '
ListOperator ::= 'intersects' | 'in' | 'pmatch'
FieldTransformerVal ::= 'val('
FieldTransformerType ::= 'tolower(' | 'toupper(' | 'b64('
Tokens (Regular Expressions):
Identifier ::= [a-zA-Z]+[a-zA-Z0-9_]*
FieldName ::= [a-zA-Z]+[a-zA-Z0-9_]*(\.[a-zA-Z]+[a-zA-Z0-9_]*)+
FieldArgBareStr ::= [^ \b\t\n\r\[\]"']+
HexNumber ::= 0[xX][0-9a-zA-Z]+
Number ::= [+\-]?[0-9]+[\.]?[0-9]*([eE][+\-][0-9]+)?
QuotedStr ::= "(?:\\"|.)*?"|'(?:\\'|.)*?'
BareStr ::= [^ \b\t\n\r\(\),="']+
Additional context
The val(<field>)
transformer is a special no-op transformer that's needed at the language parser level in order to disambuate field references from raw string values. For clarity:
-
proc.name = proc.pname
: Evaluates true for process of which comm is theproc.pname
string, and is equivalent toproc.name = "proc.pname"
-
proc.name = val(proc.pname)
: Evaluates true for process of which comm is the same as its parent's comm
/milestone 0.17.0
@jasondellaluce: The provided milestone is not valid for this repository. Milestones in this repository: [0.16.0
, 0.17.0
, TBD
, next-driver
]
Use /milestone clear
to clear the milestone.
In response to this:
Motivation
Over time, we collected plenty of requests in the context of the filtering language of libsinsp. This little DSL (domain specific language) is the basis on which Falco rules are developed and executed, and also serves other use cases across the different adopters of the Falco libs. Feedback from adopters always indicated that the language is simple and expressive, but we acknowledge that it also suffers from some limitations. To list some:
- Small modifications to existing fields mandate adding new fields
- Minor changes in the semantics of a comparison operator mandate adding new operators (e.g.
istartswith
andiglob
, which are just case insensitive versions of already-existing operators)- Field-to-field comparisons are not possible
- Interpolation, composition, or small runtime transformations of existing types are not possible
Here's a non-comprehensive collection of issues from our repositories related to the topic:
- https://github.com/falcosecurity/libs/pull/1547
- https://github.com/falcosecurity/libs/issues/1627
- https://github.com/falcosecurity/falco/issues/2612
- https://github.com/falcosecurity/falco/issues/2496
- https://github.com/falcosecurity/falco/issues/2484
- https://github.com/falcosecurity/falco/issues/2403
The general feeling is that changing the nature of the language, or making it extra complex, would defeat the simplicy principles that made the rules language widely adopted and easy to learn. Moreover, the grammar of the filtering language is quite fragile and does not leave much space to edits without the risk of introducing breaking changes of vast magnitude.
However, we also argue that there are minor feasible changes could make the language far more expressive and powerful.
Feature
I want to share an R&D project that me and @Andreagit97 spent some time on over the past weeks.
Our proposal is to update the filtering language with the notion of Field transformers. Transformers are declarative transformations that can be applied to filter fields (e.g.
proc.name
, etc...) with the purposes of supporting new detection scenarios and filtering capabilities.The proposed syntax is as follows (all fields and scenarios are random simple examples):
fd.name startswith "/etc"
: Traditional use case, which will be supported as usualtolower(fd.name) startswith "/etc"
: Lower case conversion for string field typestoupper(fd.name) startswith "/ETC"
: Upper case conversion for string field typesb64(evt.buffer) bcontains deadbeef
: base64 decoding for string and bytebuf field typesproc.name != val(proc.pname)
: field-to-field comparisonstolower(proc.name) != tolower(proc.pname)
: field-to-field comparisons, with transformerstoupper(b64(fd.name)) = TESTFILE
: base64 decoding for string and bytebuf field typesHere are some properties of field transformers:
- Implemented as an additional feature of the language, thus not introducing any breaking change from to the current state of things
- Have strong typing, thus non-ambiguous
- Easy to implement new ones for future use cases, making them future proof
- Are composable (e.g.
toupper(b64(fd.name))
)The grammar of the filtering language (current state: https://github.com/falcosecurity/libs/blob/eadccc563aa42baea827b7782a2159033a52d765/userspace/libsinsp/filter/parser.h#L27) will evolve in the following backward-compatible and non-ambiguous way:
Productions (EBNF Syntax): Expr ::= OrExpr OrExpr ::= AndExpr ('or' OrExprTail)* OrExprTail ::= ' ' AndExpr | '(' Expr ')' AndExpr ::= NotExpr ('and' AndExprTail)* AndExprTail ::= ' ' NotExpr | '(' Expr ')' NotExpr ::= ('not ')* NotExprTail NotExprTail ::= 'not(' Expr ')' | Check Check ::= Field Condition | FieldTransformer Condition | Identifier | '(' Expr ')' FieldTransformer ::= FieldTransformerType FieldTransformerTail FieldTransformerTail ::= FieldTransformerArg ')' FieldTransformerArg ::= FieldTransformer | Field FieldTransformerOrVal ::= FieldTransformer | FieldTransformerVal Field ')' Condition ::= UnaryOperator | NumOperator (NumValue | FieldTransformerOrVal) | StrOperator (StrValue | FieldTransformerOrVal) | ListOperator (ListValue | FieldTransformerOrVal) ListValue ::= '(' (StrValue (',' StrValue)*)* ')' | Identifier Field ::= FieldName('[' FieldArg ']')? FieldArg ::= QuotedStr | FieldArgBareStr NumValue ::= HexNumber | Number StrValue ::= QuotedStr | BareStr Supported Check Operators (EBNF Syntax): UnaryOperator ::= 'exists' NumOperator ::= '<=' | '<' | '>=' | '>' StrOperator ::= '==' | '=' | '!=' | 'glob ' | 'iglob ' | 'contains ' | 'icontains ' | 'bcontains ' | 'startswith ' | 'bstartswith ' | 'endswith ' ListOperator ::= 'intersects' | 'in' | 'pmatch' FieldTransformerVal ::= 'val(' FieldTransformerType ::= 'tolower(' | 'toupper(' | 'b64(' Tokens (Regular Expressions): Identifier ::= [a-zA-Z]+[a-zA-Z0-9_]* FieldName ::= [a-zA-Z]+[a-zA-Z0-9_]*(\.[a-zA-Z]+[a-zA-Z0-9_]*)+ FieldArgBareStr ::= [^ \b\t\n\r\[\]"']+ HexNumber ::= 0[xX][0-9a-zA-Z]+ Number ::= [+\-]?[0-9]+[\.]?[0-9]*([eE][+\-][0-9]+)? QuotedStr ::= "(?:\\"|.)*?"|'(?:\\'|.)*?' BareStr ::= [^ \b\t\n\r\(\),="']+
** Additional context **
The
val(<field>)
transformer is a special no-op transformer that's needed at the language parser level in order to disambuate field references from raw string values. For clarity:
proc.name = proc.pname
: Evaluates true for process of which comm is theproc.pname
string, and is equivalent toproc.name = "proc.pname"
proc.name = val(proc.pname)
: Evaluates true for process of which comm is the same as its parent's comm** Action plan **
We plan to execute the changes as follows:
- [ ] Preparing ground work on libsinsp "filter checks" data structure, that evaluates filter comparisons at runtime
- [ ] Updating the filter grammar and AST (Abstract Syntax) definitions
- [ ] Supporting the new feature in the "sinsp filter compiler", which compiles filter ASTs in the filtercheck-based executable form avaluated at runtime
- [ ] Supporting the new feature in the sinsp output formatters, which are used to format Falco rules output and print-out information about event payloads and data fields
- [ ] Document all the features on falco.org
All steps will require in-depth tests.
/milestone 0.38.0
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/milestone 0.38.0
I guess you wanted to select the last libs milestone before Falco 0.38, if so: /milestone 0.17.0
Additional proposal (can be implemented later):
-
basename()
replicating the basename behavior, useful in combination with fields holding filepaths (for example,basename(proc.exepath)
which results may be different fromproc.exe
) cc @loresuso
Additional proposal (can be implemented later):
-
join(<list>, <sep>)
concatenates the elements of the givenplaced between elements), especially useful in output:
for printing list with a custom separator
The val(
) transformer is a special no-op transformer that's needed at the language parser level in order to > disambuate field references from raw string values. For clarity: proc.name = proc.pname: Evaluates true for process of which comm is the proc.pname string, and is equivalent to proc.name = "proc.pname" proc.name = val(proc.pname): Evaluates true for process of which comm is the same as its parent's comm
Understood, it will likely cause a bit of a confusion and we need to document it very clearly. If we can think of alternatives that do not require val()
we should consider them as well.
Understood, it will likely cause a bit of a confusion and we need to document it very clearly. If we can think of alternatives that do not require val() we should consider them as well.
I agree with this. As part of this work, the plan is also to make the sinsp compiler emit warnings for potential mistakes with regards of this. Unfortunately, we explored many options and there is no better grammar construct we can employ that would not lead us to potential breaking changes in the filtering language and Falco rulesets out there. Although ugly-ish, this should guarantee complete backward compatibility with the status quo.
Corresponding documentation PR: https://github.com/falcosecurity/falco-website/pull/1319
Considering that the docs PR is open and that the 0.17.0 libs tag is out, i think we can close this one. /close
@FedeDP: Closing this issue.
In response to this:
Considering that the docs PR is open and that the 0.17.0 libs tag is out, i think we can close this one. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.