kaskada
kaskada copied to clipboard
bug: unexpected non-null behavior seen from `when` function
Description
when(condition)
is expected to filter out rows where the condition is false
or null
. This happens as expected; however, if the output of a when
is merged with rows at the same time, something seems to be populating that output value as a non-null value (I think, the last non-null value, implying that the merge
is caching the value incorrectly. Interpolation issue, perhaps?).
To Reproduce Steps to reproduce the behavior:
- Run the ignored test in
when_tests.rs
:test_when_output_resets_to_null
.
Actual Behavior The results show:
async fn test_when_output_resets_to_null() {
insta::assert_snapshot!(QueryFixture::new("{ \
count_page: count(PageViews), \
purchase_is_valid: is_valid(Purchases), \
count_when_valid: count(PageViews) | when(is_valid(Purchases)) }").run_to_csv(&purchase_fixture().await).await.unwrap(), @r###"
_time,_subsort,_key_hash,_key,sum_field
"###);
}
0 │+_time,_subsort,_key_hash,_key,count_page,purchase_is_valid,count_when_valid␊
1 │+2022-10-25T00:00:00.000000000,15615443869102979449,1644192944307425184,Davor,1,,␊
2 │+2022-10-26T00:00:00.000000000,15615443869102979450,12688524802574118068,Ben,1,,␊
3 │+2022-10-27T00:00:00.000000000,1305746571793151907,12688524802574118068,Ben,1,true,1␊
4 │+2022-10-27T00:00:00.000000000,1305746571793151908,1644192944307425184,Davor,1,true,1␊
5 │+2022-10-28T00:00:00.000000000,15615443869102979451,12688524802574118068,Ben,2,,1␊
6 │+2022-11-01T00:00:00.000000000,15615443869102979452,12688524802574118068,Ben,3,,1␊
7 │+2022-11-01T00:00:00.000000000,15615443869102979453,1644192944307425184,Davor,2,,1␊
8 │+2022-11-02T00:00:00.000000000,1305746571793151909,12688524802574118068,Ben,3,true,3␊
9 │+2022-11-02T00:00:00.000000000,1305746571793151910,1644192944307425184,Davor,2,true,2␊
10 │+2022-11-24T00:00:00.000000000,15615443869102979454,1644192944307425184,Davor,3,,2␊
11 │+2022-11-25T00:00:00.000000000,15615443869102979455,1644192944307425184,Davor,4,,2␊
12 │+2022-11-26T00:00:00.000000000,15615443869102979456,1644192944307425184,Davor,5,,2␊
13 │+2022-11-27T00:00:00.000000000,1305746571793151911,1644192944307425184,Davor,5,true,5␊
14 │+2022-12-10T00:00:00.000000000,15615443869102979457,12688524802574118068,Ben,4,,3␊
15 │+2022-12-12T00:00:00.000000000,1305746571793151912,12688524802574118068,Ben,4,true,4␊
16 │+2023-01-01T00:00:00.000000000,1305746571793151913,12688524802574118068,Ben,4,true,4␊
17 │+2023-01-01T00:00:00.000000000,15615443869102979459,1644192944307425184,Davor,6,,5␊
18 │+2023-02-07T00:00:00.000000000,15615443869102979460,12688524802574118068,Ben,5,,4␊
19 │+2023-12-31T00:00:00.000000000,15615443869102979458,12688524802574118068,Ben,6,,4␊
Expected Behavior
Expected the value of count_when_valid
to be null
when the purchase_is_valid
value is either null
or false
.
Additional context
when
produces discrete values, meaning that we should not be caching the last non-null value anywhere. Running the test just with the final feature illustrates the difference:
async fn test_when_output_resets_to_null() {
insta::assert_snapshot!(QueryFixture::new("{ \
count_when_valid: count(PageViews) | when(is_valid(Purchases)) }").run_to_csv(&purchase_fixture().await).await.unwrap(), @r###"
_time,_subsort,_key_hash,_key,sum_field
"###);
}
0 │+_time,_subsort,_key_hash,_key,count_when_valid␊
1 │+2022-10-27T00:00:00.000000000,1305746571793151907,12688524802574118068,Ben,1␊
2 │+2022-10-27T00:00:00.000000000,1305746571793151908,1644192944307425184,Davor,1␊
3 │+2022-11-02T00:00:00.000000000,1305746571793151909,12688524802574118068,Ben,3␊
4 │+2022-11-02T00:00:00.000000000,1305746571793151910,1644192944307425184,Davor,2␊
5 │+2022-11-27T00:00:00.000000000,1305746571793151911,1644192944307425184,Davor,5␊
6 │+2022-12-12T00:00:00.000000000,1305746571793151912,12688524802574118068,Ben,4␊
7 │+2023-01-01T00:00:00.000000000,1305746571793151913,12688524802574118068,Ben,4␊
Possibly related - I've seen cases where these produce different results:
foo | when(p1) | when(p2)
foo | when(p1 and p2)
This may be more predictable if we only produce discrete values where the RHS is defined at the time of the LHS (ie, don't fabricate null rows)