kaskada bug: unexpected non-null behavior seen from `when` function

bug: unexpected non-null behavior seen from `when` function

Open jordanrfrazier opened this issue 1 year ago • 1 comments

Description when(condition) is expected to filter out rows where the condition is false or null. This happens as expected; however, if the output of a when is merged with rows at the same time, something seems to be populating that output value as a non-null value (I think, the last non-null value, implying that the merge is caching the value incorrectly. Interpolation issue, perhaps?).

To Reproduce Steps to reproduce the behavior:

Run the ignored test in when_tests.rs: test_when_output_resets_to_null.

Actual Behavior The results show:

async fn test_when_output_resets_to_null() {
    insta::assert_snapshot!(QueryFixture::new("{ \
        count_page: count(PageViews), \
        purchase_is_valid: is_valid(Purchases), \
        count_when_valid: count(PageViews) | when(is_valid(Purchases)) }").run_to_csv(&purchase_fixture().await).await.unwrap(), @r###"
    _time,_subsort,_key_hash,_key,sum_field
    "###);
}

          0 │+_time,_subsort,_key_hash,_key,count_page,purchase_is_valid,count_when_valid␊
          1 │+2022-10-25T00:00:00.000000000,15615443869102979449,1644192944307425184,Davor,1,,␊
          2 │+2022-10-26T00:00:00.000000000,15615443869102979450,12688524802574118068,Ben,1,,␊
          3 │+2022-10-27T00:00:00.000000000,1305746571793151907,12688524802574118068,Ben,1,true,1␊
          4 │+2022-10-27T00:00:00.000000000,1305746571793151908,1644192944307425184,Davor,1,true,1␊
          5 │+2022-10-28T00:00:00.000000000,15615443869102979451,12688524802574118068,Ben,2,,1␊
          6 │+2022-11-01T00:00:00.000000000,15615443869102979452,12688524802574118068,Ben,3,,1␊
          7 │+2022-11-01T00:00:00.000000000,15615443869102979453,1644192944307425184,Davor,2,,1␊
          8 │+2022-11-02T00:00:00.000000000,1305746571793151909,12688524802574118068,Ben,3,true,3␊
          9 │+2022-11-02T00:00:00.000000000,1305746571793151910,1644192944307425184,Davor,2,true,2␊
         10 │+2022-11-24T00:00:00.000000000,15615443869102979454,1644192944307425184,Davor,3,,2␊
         11 │+2022-11-25T00:00:00.000000000,15615443869102979455,1644192944307425184,Davor,4,,2␊
         12 │+2022-11-26T00:00:00.000000000,15615443869102979456,1644192944307425184,Davor,5,,2␊
         13 │+2022-11-27T00:00:00.000000000,1305746571793151911,1644192944307425184,Davor,5,true,5␊
         14 │+2022-12-10T00:00:00.000000000,15615443869102979457,12688524802574118068,Ben,4,,3␊
         15 │+2022-12-12T00:00:00.000000000,1305746571793151912,12688524802574118068,Ben,4,true,4␊
         16 │+2023-01-01T00:00:00.000000000,1305746571793151913,12688524802574118068,Ben,4,true,4␊
         17 │+2023-01-01T00:00:00.000000000,15615443869102979459,1644192944307425184,Davor,6,,5␊
         18 │+2023-02-07T00:00:00.000000000,15615443869102979460,12688524802574118068,Ben,5,,4␊
         19 │+2023-12-31T00:00:00.000000000,15615443869102979458,12688524802574118068,Ben,6,,4␊

Expected Behavior Expected the value of count_when_valid to be null when the purchase_is_valid value is either null or false.

Additional context when produces discrete values, meaning that we should not be caching the last non-null value anywhere. Running the test just with the final feature illustrates the difference:

async fn test_when_output_resets_to_null() {
    insta::assert_snapshot!(QueryFixture::new("{ \
        count_when_valid: count(PageViews) | when(is_valid(Purchases)) }").run_to_csv(&purchase_fixture().await).await.unwrap(), @r###"
    _time,_subsort,_key_hash,_key,sum_field
    "###);
}

          0 │+_time,_subsort,_key_hash,_key,count_when_valid␊
          1 │+2022-10-27T00:00:00.000000000,1305746571793151907,12688524802574118068,Ben,1␊
          2 │+2022-10-27T00:00:00.000000000,1305746571793151908,1644192944307425184,Davor,1␊
          3 │+2022-11-02T00:00:00.000000000,1305746571793151909,12688524802574118068,Ben,3␊
          4 │+2022-11-02T00:00:00.000000000,1305746571793151910,1644192944307425184,Davor,2␊
          5 │+2022-11-27T00:00:00.000000000,1305746571793151911,1644192944307425184,Davor,5␊
          6 │+2022-12-12T00:00:00.000000000,1305746571793151912,12688524802574118068,Ben,4␊
          7 │+2023-01-01T00:00:00.000000000,1305746571793151913,12688524802574118068,Ben,4␊

Apr 28 '23 16:04 jordanrfrazier

Possibly related - I've seen cases where these produce different results:

foo | when(p1) | when(p2)
foo | when(p1 and p2)

This may be more predictable if we only produce discrete values where the RHS is defined at the time of the LHS (ie, don't fabricate null rows)

May 05 '23 16:05 kerinin

kaskada kaskada copied to clipboard

bug: unexpected non-null behavior seen from `when` function

kaskada
kaskada copied to clipboard