OpenMLDB
OpenMLDB copied to clipboard
Window union will process the future data in window for online request mode
Bug Description
Now we use online request mode to process data and get incorrect result.
Steps to Reproduce
Create the test database and tables.
CREATE DATABASE IF NOT EXISTS db1;
CREATE TABLE IF NOT EXISTS db1.t1(
ingestionTime timestamp,
new_user_id string,
index(key=(new_user_id), ttl=(0m, 0), ttl_type=absandlat, ts=`ingestionTime`)
);
CREATE TABLE IF NOT EXISTS db1.t2(
ingestionTime timestamp,
new_user_id string,
index(key=(new_user_id), ttl=(0m, 0), ttl_type=absandlat, ts=`ingestionTime`)
);
Insert the test data.
set @@execute_mode='online';
insert into db1.t2 values (100, "abc");
insert into db1.t2 values (101, "abc");
Deploy the SQL.
use db1;
deploy demo1 select
new_user_id as new_user_id,
count(`new_user_id`) over w as window_count,
from
db1.t1
window w as (
UNION (select * from db1.t2) partition by `new_user_id` order by `ingestionTime` rows between 10 preceding and current row INSTANCE_NOT_IN_WINDOW);
Now we use online request mode with the data (99, "abc"). Since the data in t2 is larger than request row, we should get count of window is 1.
curl http://127.0.0.1:9080/dbs/db1/deployments/demo1 -X POST -d '{ "input": [ [ 99, "abc" ] ] }'
{"code":0,"msg":"ok","data":{"data":[["abc",1]]}}
Test with other data and always get result of 3.
curl http://127.0.0.1:9080/dbs/db1/deployments/demo1 -X POST -d '{ "input": [ [ 200, "abc" ] ] }'
{"code":0,"msg":"ok","data":{"data":[["abc",3]]}}
Currently we can not reproduce the issue with command line to create table with indexes.
I can't reproduce the issue with this case:
- id: 41
inputs:
- columns: ["id int", "val string", "ts timestamp"]
indexs: ["index1:val:ts"]
rows:
- [3, 'abc', 99]
- [4, 'abc', 100]
- [5, 'abc', 101]
- columns: ["id int", "val string", "ts timestamp"]
indexs: ["index1:val:ts"]
rows:
- [1, 'abc', 100]
- [2, 'abc', 101]
sql: |
select id, val, count(val) over w as agg
from {0}
window w as (
union (select * from {1})
partition by val order by ts
rows between 2 preceding and current row
instance_not_in_window
)
expect:
order: id
columns: ["id int","val string","agg int64"]
rows:
- [3, 'abc', 1]
- [4, 'abc', 2]
- [5, 'abc', 3]
Any extra info ?
May be the issue in macos when index is incorrect.