flux icon indicating copy to clipboard operation
flux copied to clipboard

Mapping in joined table breaks join and causes no results

Open oagudo opened this issue 6 years ago • 7 comments

When joining 2 tables, we consistently see queries breaking when either input table uses map. When joining them without using map it works perfectly. However, if any of the joined tables contains a map operation on the _value fields, the join result is empty. While our original intention was to use map or stateCount to perform conditional sum, we think this may be part of a larger bug, or something undocumented about how flux works that we're not understanding.

This is reproducible even with incredibly simple maps, such as multiplying the value by 2, or even just mapping it directly. Even stranger is that even when the join produces no results, each individual query still produces a valid table, with columns that indicate a join should be successful.

An example that works fine, grouping, summing by each group, and then ungrouping to a single table:

table1 = from(bucket:"bucket/autogen") 
|> range(start:last_hour) 
|> filter(fn: (r) => r._measurement == "myMeasurement" and r.key =~ /^myKey$/ and r._field == "myField")
|> group(columns: ["columnA", "columnB"], mode:"by")
|> sum()
|> group(columns:[])

Second table is very similar to the above table1, and we have not needed to change it. Our Join, with a computed metric between the values of joined tables:

join( tables: {first:table1, second:table2}, on: ["columnA", "columnB"])
|>map(fn: (r) => ({ columnA: r.columnA, columnB: r.columnB, computedMetric : math.round(x: float(v : r._value_first) / float(v : r._value_second) * 100.0)}))

This all works fine, until we try to add any sort of mapping on table1. Then immediately we get a "no results" in chronograf, which in inspecting network traffic appears to be an empty 200 response. While originally we were trying to do a conditional sum, any mapping to value whatsoever seems to break the join.

Query example:

table1 = from(bucket:"bucket/autogen") 
|> range(start:last_hour) 
|> filter(fn: (r) => r._measurement == "myMeasurement" and r.key =~ /^myKey$/ and r._field == "myField")
|> group(columns: ["columnA", "columnB"], mode:"by")
|> map(fn: (r) => r._value, mergeKey: true)
|> sum()
|> group(columns:[])

Curious to see if this is some sort of flux bug, some illegal operation that isn't throwing an error, or just something we're doing wrong here (please advise if so).

oagudo avatar May 08 '19 08:05 oagudo

@oagudo What does the output of the map() function look like in table1?

sanderson avatar May 08 '19 14:05 sanderson

@sanderson The output of the query removing the statements after map is:

from(bucket:"bucket/autogen") |> range(start:last_hour) |> filter(fn: (r) => r._measurement == "myMeasurement" and r.key =~ /^myKey$/ and r._field == "myField") |> group(columns: ["columnA", "columnB"], mode:"by") |> map(fn: (r) => r._value, mergeKey: true)

#datatype string long dateTime:RFC3339 dateTime:RFC3339 string string double
#group FALSE FALSE TRUE TRUE TRUE TRUE FALSE
#default _result            
  result table _start _stop columnA columnB _value
    0 2019-04-10T15:47:36.7256795Z 2019-05-10T15:47:36.7256795Z AValue1 BValue1 1
    0 2019-04-10T15:47:36.7256795Z 2019-05-10T15:47:36.7256795Z AValue2 BValue2 1
    1 2019-04-10T15:47:36.7256795Z 2019-05-10T15:47:36.7256795Z AValue3 BValue3 1
    1 2019-04-10T15:47:36.7256795Z 2019-05-10T15:47:36.7256795Z AValue4 BValue4 1
....

oagudo avatar May 10 '19 15:05 oagudo

Ok, add sum() back in there. What's the output after sum()?

sanderson avatar May 10 '19 16:05 sanderson

After adding sum the result is the following:

from(bucket:"bucket/autogen") |> range(start:last_hour) |> filter(fn: (r) => r._measurement == "myMeasurement" and r.key =~ /^myKey$/ and r._field == "myField") |> group(columns: ["columnA", "columnB"], mode:"by") |> map(fn: (r) => r._value, mergeKey: true) |> sum()

#datatype string long dateTime:RFC3339 dateTime:RFC3339 string string double
#group FALSE FALSE TRUE TRUE TRUE TRUE FALSE
#default _result            
  result table _start _stop columnA columnB _value
    0 2019-05-05T07:28:00.1880642Z 2019-05-15T07:28:00.1880642Z AValue1 BValue1 4
    1 2019-05-05T07:28:00.1880642Z 2019-05-15T07:28:00.1880642Z AValue1 BValue2 3
    2 2019-05-05T07:28:00.1880642Z 2019-05-15T07:28:00.1880642Z AValue1 BValue3 2
    3 2019-05-05T07:28:00.1880642Z 2019-05-15T07:28:00.1880642Z AValue1 BValue4 1
    4 2019-05-05T07:28:00.1880642Z 2019-05-15T07:28:00.1880642Z AValue1 BValue5 1
....

oagudo avatar May 15 '19 07:05 oagudo

I am having the same issue with the OP.
A join() of TableA and TableB return a table with no problem. Now if either TableA or TableB are first joined with a third table, they no longer return anything but "No results" when I attempt to join them together. I even get "no results" after I have passed TableA through a join() that does not modify it (TableA) at all. (I compared the downloaded .CSVs) On the other hand if I pass both TableA and TableB (individually) through a non-modifying join() they may then be joined together without a problem. The "no results" only happens when only one of the two original tables have been passed through join() before attempting to join them together. It's very strange.

hubdr avatar Jun 04 '21 20:06 hubdr

I am having the same issue with the OP. A join() of TableA and TableB return a table with no problem. Now if either TableA or TableB are first joined with a third table, they no longer return anything but "No results" when I attempt to join them together. I even get "no results" after I have passed TableA through a join() that does not modify it (TableA) at all. (I compared the downloaded .CSVs) On the other hand if I pass both TableA and TableB (individually) through a non-modifying join() they may then be joined together without a problem. The "no results" only happens when only one of the two original tables have been passed through join() before attempting to join them together. It's very strange.

In my case it seems the issue has to do with the column order (as listed under raw data.) If a join() modifies the column order of one of the tables then that table cannot be joined with a table with unmodified column order.

hubdr avatar Jun 04 '21 22:06 hubdr

I'm having the same issue, and suspect that the reason can be as @hubdr mentioned regarding the column order. I'm joining on "_time" and some tag, however after the first join() the table that results pushes those two columns to the end and the second join that tries to add some other column fails.

aortuno-liberty avatar Jul 01 '21 15:07 aortuno-liberty

This issue has had no recent activity and will be closed soon.

github-actions[bot] avatar Aug 08 '24 01:08 github-actions[bot]