SNOW-1526571: Mock_iff not working if there are null values in the columns
Please answer these questions before submitting your issue. Thanks!
-
What version of Python are you using?
3.11.8
-
What operating system and processor architecture are you using?
Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
-
What are the component versions in the environment (
pip freeze)?
snowflake-connector-python==3.11.0 snowflake-snowpark-python==1.19.0
- What did you do? Start local testing session and create
from snowflake.snowpark import Session
import snowflake.snowpark.functions as spf
conn_params = {
"schema": self.schema,
"local_testing": True,
"timezone": gc.timezone_local,
**kwargs,
}
session = Session.builder.configs(conn_params).create()
import snowflake.snowpark.functions as spf
from snowflake.snowpark.window import Window
data = [
(1, 1, 1, None),
(1, 1, 2, None),
(1, 1, 3, 1),
(1, 1, 4, None),
(1, 1, 5, None),
(1, 1, 6, 2),
(1, 1, 7, None),
(1, 1, 8, None),
]
schema = ["COL1","COL2","COL3","COLUMN_TO_FILL"]
df = session.create_dataframe(
data=data,
schema=schema,
)
window = Window.partition_by(["COL1", "COL2"]).order_by("COL3")
lead = spf.lead(df.col("COLUMN_TO_FILL"), ignore_nulls=True).over(window)
lag = spf.lag(df.col("COLUMN_TO_FILL"), ignore_nulls=True).over(window)
max_lead_lag = spf.iff(lead > lag, lead, lag)
df = df.with_column("MAX_LEAD_LAG",max_lead_lag)
result = df.to_pandas()
expected = pd.DataFrame(
[
(1, 1, 1, None, None),
(1, 1, 2, None, None),
(1, 1, 3, 1, None),
(1, 1, 4, None, 2.0),
(1, 1, 5, None, 2.0),
(1, 1, 6, 2, 1.0),
(1, 1, 7, None, 2.0),
(1, 1, 8, None, 2.0),
],
columns=schema + ["MAX_LEAD_LAG"],
)
pd.testing.assert_frame_equal(
result,
expected,
check_dtype=False,
)
-
What did you expect to see?
That this should assert to True and not false.
The column "MAX_LEAD_LAG" is all Nones. The reason for it is that the lead and lag column are NullType columns afterwards and hence the max_lead_lag is full of Nones. Hence it is not really a problem of mock_iff but of the lead and lag testing method returning NullType columns.
@patch("iff")
def mock_iff(condition: ColumnEmulator, expr1: ColumnEmulator, expr2: ColumnEmulator):
assert isinstance(condition.sf_type.datatype, BooleanType)
coerce_result = get_coerce_result_type(expr1.sf_type, expr2.sf_type) #### This is none, since both lead and lag are NullTypecolumns
if all(condition) or all(~condition) or coerce_result is not None:
res = ColumnEmulator(data=[None] * len(condition), dtype=object)
expr1 = cast_column_to(expr1, coerce_result) ##### This returns empty
expr2 = cast_column_to(expr2, coerce_result) ##### This returns empty
res.where(condition, other=expr2, inplace=True)
res.where([not x for x in condition], other=expr1, inplace=True)
res.sf_type = coerce_result
return res ##### Now this is full of Nones too
else:
raise SnowparkLocalTestingException(
f"[Local Testing] expr1 and expr2 have conflicting datatypes that cannot be coerced: {expr1.sf_type} <-> {expr2.sf_type}"
)