influxdb-client-java icon indicating copy to clipboard operation
influxdb-client-java copied to clipboard

Proposal: Java DSL should construct AST instead of string building the Flux queries.

Open nathanielc opened this issue 4 years ago • 6 comments

The Java DSL API creates a convenient way for constructing Flux queries. The current implementation does so by building a Flux source string from the parameters to the various Java classes.

String building Flux queries opens the door to possible injection attacks from user input. I propose that the DSL construct AST instead of strings this way it can guarantee that the constructed query is safe and contains only the contents that where expected.

Here is an example of how the current DSL could be leveraged to inject arbitrary Flux.

First we start with the most basic usage of the DSL API.

Flux flux = Flux
    .from("telegraf")
    .window(15L, ChronoUnit.MINUTES, 20L, ChronoUnit.SECONDS)
    .sum();

Now let's imagine that the bucket name is not hardcoded but instead is user input as a parameter to the code:

Flux flux = Flux
    .from(bucket)
    .window(15L, ChronoUnit.MINUTES, 20L, ChronoUnit.SECONDS)
    .sum();

If a user provides this string as their bucket name they can inject any Flux code they wish into the query.

"+ (()=> {
    // any Flux code you want as the body of the function
   return "telegraf"
})() + "

The above would produce this completely valid Flux code:

from(bucket: "" + (() => {
    // any Flux code you want as the body of the function
   return "telegraf"
})() + "")
    |> window(every:15m, period: 20s)
    |> sum()

Flux will execute the anonymous function to determine the value of the bucket argument, and since that function returns a valid bucket name telegraf no errors would be thrown.

How does using AST prevent this problem?

If the From function created an AST representation of the query then the value to the bucket argument would be encoded as the StringLiteral AST node, something like this:

{
    "type" : "StringLiteral",
    "value": "\" + (()=> {\n    // any Flux code you want as the body of the function\n   return \"telegraf\"\n})() + \""
}

Then the equivalent Flux that would have been produced is:

from(bucket: "\"\" + (() => {
    // any Flux code you want as the body of the function
   return \"telegraf\"
})() + \"\"")
    |> window(every:15m, period: 20s)
    |> sum()

This time the value of the bucket argument is no longer arbitrary code, but rather a long string. Note that using the AST the query is not ever converted back into Flux source code during processing, the engine can consume and process the AST directly without having to format and reparse it.

I propose that the DSL construct POJO s that represent the AST and then serialize those objects to JSON and submit them over the HTTP API.

The JSON encoding of the Flux AST is part of the HTTP swagger document. For example here is the section on StringLiterals: https://github.com/influxdata/influxdb/blob/master/http/swagger.yml#L6962

nathanielc avatar Dec 03 '20 23:12 nathanielc

Hi @nathanielc,

The our API is prepared to consume AST queries:

  • https://github.com/influxdata/influxdb-client-java/blob/9c8af0c1af5cc4519407b7cb3a91419027ce8ec9/client/src/main/java/com/influxdb/client/QueryApi.java#L87
  • https://github.com/influxdata/influxdb-client-java/blob/31834c04033c73b4c8635d9e50b0ac830d4a9931/client/src/generated/java/com/influxdb/client/domain/Query.java#L38
  • https://github.com/influxdata/influxdb-client-java/blob/b89c5e8923bd380c6c5b62807a074bbdd411ac24/client/src/generated/java/com/influxdb/client/domain/StringLiteral.java#L32

so we have to prepare something like toQuery option that will transform DSL API into Query with AST in the extern field: https://github.com/influxdata/influxdb/blob/992791177a24e902156eb7ae9da9d097662522d5/http/swagger.yml#L6547.

Something like:

Flux flux = Flux
    .from("telegraf")
    .window(15L, ChronoUnit.MINUTES, 20L, ChronoUnit.SECONDS)
    .sum();

Query query = flux.toQuery();
...

Regards

bednar avatar Dec 04 '20 08:12 bednar

@bednar That sounds great! How will toQuery work? Will the DSL API construct AST as it goes along and the toQuery just wraps the final AST root?

nathanielc avatar Dec 04 '20 16:12 nathanielc

Yes, or maybe something like toAST() will be better.

bednar avatar Dec 07 '20 14:12 bednar

Also if it helps the AST for Flux is defined in Go here https://godoc.org/github.com/influxdata/flux/ast and in Rust here https://github.com/influxdata/flux/blob/master/libflux/core/src/ast/mod.rs

nathanielc avatar Dec 09 '20 17:12 nathanielc

The AST is also in Java - https://github.com/influxdata/influxdb-client-java/tree/master/client/src/generated/java/com/influxdb/client/domain.

Is there a simple way how We can transform "function parameter expression" to corresponding AST structure?

For example, the duration -6h to

{
	"type": "UnaryExpression",
	"operator": "-",
	"argument": {
		"type": "DurationLiteral",
		"values": [{
			"magnitude": 6,
			"unit": "h"
		}]
	}
}

The DataExplorer uses this asAssignment function, but it looks too tricky:

https://github.com/influxdata/influxdb/blob/020d938537de199246e52e91760af54ecc155075/ui/src/variables/selectors/index.tsx#L198

bednar avatar Dec 11 '20 08:12 bednar

@bednar I am not sure what you are asking, the Java AST representation looks good, so if you use that to generate the Flux AST as JSON it should work well.

I am not sure how asAssignment is used in the UI and why it has that complexity.

nathanielc avatar Dec 14 '20 23:12 nathanielc