influxdb icon indicating copy to clipboard operation
influxdb copied to clipboard

`range()` does not filter correctly depending on year

Open paulheg opened this issue 1 year ago • 2 comments

Steps to reproduce: List the minimal actions needed to reproduce the behaviour.

I have a dataset where each point is years apart, the bucket was created in Version 2.0.7 with a shard group duration of 1000w. I am now on version 2.7.5

  1. The following query works as expected:
rx = /(Apfel)$/

from(bucket: "common_two_grams")
	|> range(start: 1971-01-01, stop: 2020-01-01)
	|> filter(fn: (r) => r["_measurement"] == "frequency")
	|> filter(fn: (r) => r["_field"] == "match_count")
	|> filter(fn: (r) => r["w1"] =~ rx or r["w2"] =~ rx)
  	|> group(columns: ["_time"])
	|> sum()
	|> group()
  1. The following query does not work as expected, notice that only the start year changed from 1971 to 1970.
rx = /(Apfel)$/

from(bucket: "common_two_grams")
	|> range(start: 1970-01-01, stop: 2020-01-01)
	|> filter(fn: (r) => r["_measurement"] == "frequency")
	|> filter(fn: (r) => r["_field"] == "match_count")
	|> filter(fn: (r) => r["w1"] =~ rx or r["w2"] =~ rx)
  	|> group(columns: ["_time"])
	|> sum()
	|> group()

Expected behaviour: This is the correct data I am expecting for the first query:

#group,false,false,false,false
#datatype,string,long,dateTime:RFC3339,unsignedLong
#default,_result,,,
,result,table,_time,_value
,,0,1971-11-29T23:00:00Z,9940
,,0,1972-11-29T23:00:00Z,9080
[....]
,,0,2018-11-29T23:00:00Z,69394

Actual behaviour: The second query does not filter the time correctly, as the complete time series is now in the dataset. Notice how it begins with 1725

#group,false,false,false,false
#datatype,string,long,dateTime:RFC3339,unsignedLong
#default,_result,,,
,result,table,_time,_value
,,0,1725-11-29T23:06:32Z,42
,,0,1726-11-29T23:06:32Z,28
,,0,1727-11-29T23:06:32Z,71
,,0,1728-11-29T23:06:32Z,49
,,0,1729-11-29T23:06:32Z,81
[...]
,,0,2015-11-29T23:00:00Z,87953
,,0,2016-11-29T23:00:00Z,90141
,,0,2017-11-29T23:00:00Z,74672
,,0,2018-11-29T23:00:00Z,69394

When adding aggregateWindow into the query, the range seems to work again:

rx = /(Apfel)$/

from(bucket: "common_two_grams")
	|> range(start: 1970-01-01, stop: 2020-01-01)
	|> filter(fn: (r) => r["_measurement"] == "frequency")
	|> filter(fn: (r) => r["_field"] == "match_count")
	|> filter(fn: (r) => r["w1"] =~ rx or r["w2"] =~ rx)
	|> aggregateWindow(every: 1y, fn: last)
  	|> group(columns: ["_time"])
	|> sum()
	|> group()

Environment info:

Running inside a docker container with version 2.7.5

paulheg avatar Feb 14 '24 16:02 paulheg

I can confirm the behavior in 2.7. I believe it is related to that unix timestamp of zero is 1970-01-01T00:00:00Z. Timestamps before that are negative and it appears that is not handled correctly in the flux query (or execution).

If it is an issue with flux, you might be able to work around it with influxql. (I checked - influxql does better.)

We would welcome a pr fixing the issue if you are willing.

philjb avatar Feb 15 '24 03:02 philjb

I just checked if its an issue with flux by modifying the range_test.flux and it does not seem to be the case.

package universe_test


import "testing"
import "csv"

option now = () => 2030-01-01T00:00:00Z

inData =
    "
#datatype,string,long,dateTime:RFC3339,double,string,string,string,string
#group,false,false,false,false,true,true,true,true
#default,_result,,,,,,,
,result,table,_time,_value,_field,_measurement,cpu,host
,,0,1701-01-01T00:00:01Z,0,usage_guest,cpu,cpu-total,host.local
,,0,1969-01-01T00:00:01Z,0,usage_guest,cpu,cpu-total,host.local
,,0,1970-01-01T00:00:01Z,0,usage_guest,cpu,cpu-total,host.local
,,0,1971-01-01T00:00:01Z,0,usage_guest,cpu,cpu-total,host.local
"
outData =
    "
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string,string,string
#group,false,false,true,true,false,false,true,true,true,true
#default,_result,,,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement,cpu,host
,,0,1970-01-01T00:00:00Z,2030-01-01T00:00:00Z,1970-01-01T00:00:01Z,0,usage_guest,cpu,cpu-total,host.local
,,0,1970-01-01T00:00:00Z,2030-01-01T00:00:00Z,1971-01-01T00:00:01Z,0,usage_guest,cpu,cpu-total,host.local
"

testcase range {
    got =
        csv.from(csv: inData)
            |> testing.load()
            |> range(start: 1970-01-01)
    want = csv.from(csv: outData)

    testing.diff(got, want)
}

paulheg avatar Feb 18 '24 12:02 paulheg

Hi @philjb, as far as I understand, flux is doing the filtering, I found the go code for that in the flux repository. Looking at the test from my last comment tells me maybe there is something wrong in the database and not the query language.

Can you give me a general direction where to search for the possible issue in the influxdb repository?

Best regards Paul

paulheg avatar Feb 19 '24 19:02 paulheg

I'd recommend reducing your flux query to the smallest query that shows the behavior - for example, is the aggregate window essential to reproducing it? Then I'd look at the implementations of the storage interface type Store interface. All the requests include a Range *TimestampRange which stores the query time range. I suspect there is special handling (that is "wrong") when the range start time is zero.

philjb avatar Feb 20 '24 17:02 philjb

Quick update, I just used the FluxQL to select data between 1820 and 1830 and this works fine.

paulheg avatar Feb 27 '24 13:02 paulheg

Hi @philjb, I think I found the issue. My only problem now is that Im unable to build the project because of dependencies from the flux repository. Mainly because the https://github.com/SAP/go-hdb v.0 does not seem to exist anymore.

I have a separate Issue already opened: https://github.com/influxdata/flux/issues/5462. Sadly I cant verify if my fix is correct.

paulheg avatar Mar 07 '24 13:03 paulheg

I could verify the issue by running the binary on my local machine. I'm not sure as to why this check was done, but it seems to be the cause of the issue.

paulheg avatar Mar 07 '24 15:03 paulheg

@paulheg - were you able to build a binary to test with locally? I can't tell if you were able to build based on your last comment.

Can you add a test for the scenario to your pr?

philjb avatar Mar 07 '24 16:03 philjb

@philjb I was able to build it locally, run it, and verify.

Just found a work around:

GOPROXY=https://proxy.golang.org/cached-only go mod download github.com/SAP/[email protected]

I'll have a look, but I can't think of a useful test for this scenario right now. To check that the value I passed does not get modified?

paulheg avatar Mar 07 '24 16:03 paulheg

Can you add a test for the scenario to your pr?

There is now a test in the pr.

paulheg avatar Mar 07 '24 17:03 paulheg