influxdb-client-java
influxdb-client-java copied to clipboard
Proposal: Java DSL should construct AST instead of string building the Flux queries.
The Java DSL API creates a convenient way for constructing Flux queries. The current implementation does so by building a Flux source string from the parameters to the various Java classes.
String building Flux queries opens the door to possible injection attacks from user input. I propose that the DSL construct AST instead of strings this way it can guarantee that the constructed query is safe and contains only the contents that where expected.
Here is an example of how the current DSL could be leveraged to inject arbitrary Flux.
First we start with the most basic usage of the DSL API.
Flux flux = Flux
.from("telegraf")
.window(15L, ChronoUnit.MINUTES, 20L, ChronoUnit.SECONDS)
.sum();
Now let's imagine that the bucket name is not hardcoded but instead is user input as a parameter to the code:
Flux flux = Flux
.from(bucket)
.window(15L, ChronoUnit.MINUTES, 20L, ChronoUnit.SECONDS)
.sum();
If a user provides this string as their bucket name they can inject any Flux code they wish into the query.
"+ (()=> {
// any Flux code you want as the body of the function
return "telegraf"
})() + "
The above would produce this completely valid Flux code:
from(bucket: "" + (() => {
// any Flux code you want as the body of the function
return "telegraf"
})() + "")
|> window(every:15m, period: 20s)
|> sum()
Flux will execute the anonymous function to determine the value of the bucket
argument, and since that function returns a valid bucket name telegraf
no errors would be thrown.
How does using AST prevent this problem?
If the From
function created an AST representation of the query then the value to the bucket
argument would be encoded as the StringLiteral
AST node, something like this:
{
"type" : "StringLiteral",
"value": "\" + (()=> {\n // any Flux code you want as the body of the function\n return \"telegraf\"\n})() + \""
}
Then the equivalent Flux that would have been produced is:
from(bucket: "\"\" + (() => {
// any Flux code you want as the body of the function
return \"telegraf\"
})() + \"\"")
|> window(every:15m, period: 20s)
|> sum()
This time the value of the bucket argument is no longer arbitrary code, but rather a long string. Note that using the AST the query is not ever converted back into Flux source code during processing, the engine can consume and process the AST directly without having to format and reparse it.
I propose that the DSL construct POJO s that represent the AST and then serialize those objects to JSON and submit them over the HTTP API.
The JSON encoding of the Flux AST is part of the HTTP swagger document. For example here is the section on StringLiterals: https://github.com/influxdata/influxdb/blob/master/http/swagger.yml#L6962
Hi @nathanielc,
The our API is prepared to consume AST queries:
- https://github.com/influxdata/influxdb-client-java/blob/9c8af0c1af5cc4519407b7cb3a91419027ce8ec9/client/src/main/java/com/influxdb/client/QueryApi.java#L87
- https://github.com/influxdata/influxdb-client-java/blob/31834c04033c73b4c8635d9e50b0ac830d4a9931/client/src/generated/java/com/influxdb/client/domain/Query.java#L38
- https://github.com/influxdata/influxdb-client-java/blob/b89c5e8923bd380c6c5b62807a074bbdd411ac24/client/src/generated/java/com/influxdb/client/domain/StringLiteral.java#L32
so we have to prepare something like toQuery
option that will transform DSL API into Query
with AST
in the extern
field: https://github.com/influxdata/influxdb/blob/992791177a24e902156eb7ae9da9d097662522d5/http/swagger.yml#L6547.
Something like:
Flux flux = Flux
.from("telegraf")
.window(15L, ChronoUnit.MINUTES, 20L, ChronoUnit.SECONDS)
.sum();
Query query = flux.toQuery();
...
Regards
@bednar That sounds great! How will toQuery
work? Will the DSL API construct AST as it goes along and the toQuery
just wraps the final AST root?
Yes, or maybe something like toAST()
will be better.
Also if it helps the AST for Flux is defined in Go here https://godoc.org/github.com/influxdata/flux/ast and in Rust here https://github.com/influxdata/flux/blob/master/libflux/core/src/ast/mod.rs
The AST is also in Java - https://github.com/influxdata/influxdb-client-java/tree/master/client/src/generated/java/com/influxdb/client/domain.
Is there a simple way how We can transform "function parameter expression" to corresponding AST structure?
For example, the duration -6h
to
{
"type": "UnaryExpression",
"operator": "-",
"argument": {
"type": "DurationLiteral",
"values": [{
"magnitude": 6,
"unit": "h"
}]
}
}
The DataExplorer uses this asAssignment
function, but it looks too tricky:
https://github.com/influxdata/influxdb/blob/020d938537de199246e52e91760af54ecc155075/ui/src/variables/selectors/index.tsx#L198
@bednar I am not sure what you are asking, the Java AST representation looks good, so if you use that to generate the Flux AST as JSON it should work well.
I am not sure how asAssignment
is used in the UI and why it has that complexity.