Add array style indexing syntax for streams, and tables

Open pauldix opened this issue 7 years ago • 0 comments

This is to add an array index syntax to quickly lookup parts of streams or tables. It provides a more terse shortcut syntax to filter, range, and group. The language spec might look something like this:

<stream object>[<predicate>,<time>:<time>,<list of strings>]

// and here's an example
from(bucket:"foo")[_measurement == "cpu" and _field == "usage_user", 2018-11-07:2018-11-08, ["_measurement", "_time", "_value", "_field"]

The stream object is any function that outputs tables (what's pipe forwarded to most functions.

The predicate is like what is passed into filter but it has more limitations. Specifically, the left hand operand must be an identifier (tag key). If it has whitespace, or commas it must be in double quotes. The right hand operand is always a value or a variable identifier. Parentheses can be used for precedence.

The time range specified in the middle argument can actually be any kind of range, but it must match the type that the tables in the stream are sorted by. In the above example, the tables come back by time so the first part of the range is a start and the second part is the end. Both can be left out which specifies last. The start and the end can also be relative times using a duration like -1h.

Finally, the last argument is a list of columns to keep. This also effectively changes the group key for the result.

I see many advantages to this syntax. First, it offers something that is more terse for users to type from the command line or repl. Also, it makes building an autocomplete GUI much simpler. Forcing the left hand operands in the predicate to be tag keys makes it easy to have a drill down interface.

Here are a few examples along with the equivalent long hand forms:

from(bucket:"foo")[_measurement == "cpu" and _field == "usage_user", 2018-11-07:2018-11-08, ["_measurement", "_time", "_value", "_field"]

from(bucket:"foo")
  |> filter(fn: (row) => row._measurement == "cpu" and row._field == "usage_user")
  |> range(start: 2018-11-07, end: 2018-11-08)
  |> keep(columns: ["_measurement", "_time", "_value", "_field"])

from(bucket:"foo")[_measurement == "cpu"]
// notice the trailing commas can be left off
from(bucket: "foo")
  |> filter(fn: (row) => row._measurement == "cpu")
  |> last()

from(bucket:"foo")["some tag" == "asdf",,]

from(bucket: "foo")
  |> filter(fn: (row) => row["some tag"] == "asdf")
  |> last()

from(bucket:"foo")[foo=="bar",-1h]

from(bucket: "foo")
  |> filter(fn: (row) => row.foo == "bar")
  |> range(start: -1h)

Already talked to @nathanielc and @stuartcarnie about this at InfluxDays. Curious to hear from @davkal @121watts @bthesorceror

Nov 09 '18 02:11 pauldix