SNOW-1367643: When/Otherwise throws a KeyError depending on the order of the select vs where clause
Please answer these questions before submitting your issue. Thanks!
-
What version of Python are you using?
Python 3.10.4 (main, May 26 2022, 13:33:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
-
What are the Snowpark Python and pandas versions in the environment?
pandas==2.2.1 snowflake-snowpark-python==1.15.0
-
What did you do? I'm trying to test a when().otherwise() operation and it throws a KeyError exception if the where() clause is placed before the select() clause.
import snowflake.snowpark as sp
import snowflake.snowpark.functions as sf
import snowflake.snowpark.types as st
mock_session = sp.Session.builder.config("local_testing", True).create()
inputs = mock_session.create_dataframe([
sp.Row(ID=1, FIRST_NAME="Homer", LAST_NAME="Simpson"),
sp.Row(ID=2, FIRST_NAME="Marge", LAST_NAME="Simpson"),
sp.Row(ID=3, FIRST_NAME="Clark", LAST_NAME="Kent"),
sp.Row(ID=4, FIRST_NAME="Bruce", LAST_NAME="Wayne")
])
# Note: if you place the select(...) before the where(...) call, this works.
df = inputs.where(
(sf.col("ID") < 10) &
(sf.col("FIRST_NAME").in_("Marge", "Clark"))
).select(
sf.col("ID"),
sf.col("FIRST_NAME"),
sf.when(
sf.col("LAST_NAME").in_("Kent", "Bruce"),
sf.lit("---REDACTED---").astype(st.StringType())
).otherwise(
sf.col("LAST_NAME")
).alias("LAST_NAME")
)
results = df.collect()
KeyError: [2] not in index
As noted in the comment, changing the order of operations by calling select(...) before where(...) works. However, this won't work for all use cases since there are cases where a select doesn't include a column that's filtered on. For example, given the inputs above:
# works in Snowpark connected to our Snowflake instance, does not work under test.
df = inputs.select(
sf.col("FIRST_NAME"),
sf.col("LAST_NAME")
).where(
(sf.col("ID") == 1)
)
results = df.collect()
KeyError: '"ID"'
- What did you expect to see?
I expected two results:
[Row(ID=2, FIRST_NAME='Marge', LAST_NAME='Simpson'),
Row(ID=3, FIRST_NAME='Clark', LAST_NAME='---REDACTED---')]