snowpark-python icon indicating copy to clipboard operation
snowpark-python copied to clipboard

SNOW-1075566: Patching function with no argument

Open petsvakala opened this issue 1 year ago • 9 comments

Hi

I was following instructions how to patch built-in functions ( https://docs.snowflake.com/en/developer-guide/snowpark/python/testing-locally#patching-built-in-functions) however I am not sure how to do that for current_date() function.

This is how I have approached that:

@patch(current_date)
def patch_current_date() -> ColumnEmulator:
    ret_column = ColumnEmulator(data=[datetime.date.today()])
    ret_column.sf_type = ColumnType(DateType(), True)
    return ret_column

but this only fills the first row of dataframe. Rest of the rows for that column will be NA. image

This is how that specific code line looks like in my test function: input_df.with_column('CURRENT_DATE', current_date())

petsvakala avatar Feb 22 '24 20:02 petsvakala

hi @petsvakala , thanks for reaching out. @sfc-gh-jrose I know you added the support to current_date recently, can you help take a look at this issue to see if this is covered?

sfc-gh-aling avatar Feb 23 '24 18:02 sfc-gh-aling

I did add current_date recently, but it hasn't made it into a release yet I don't think. I believe the issue in this bug is this line:

    ret_column = ColumnEmulator(data=[datetime.date.today()])

The column emulator assumes that the data is the same length as the column and inserts None if no data remains in the list. If you remove the list braces it will instead be a single value that is used for all entries in the column instead.

    ret_column = ColumnEmulator(data=datetime.date.today())

sfc-gh-jrose avatar Feb 23 '24 18:02 sfc-gh-jrose

Thank you for quick reply. I tried what you recommended (removed braces) but still face same issue:

@patch(current_date)
def patch_current_date() -> ColumnEmulator:
    ret_column = ColumnEmulator(data=datetime.date.today())
    ret_column.sf_type = ColumnType(DateType(), True)
    return ret_column

petsvakala avatar Feb 23 '24 18:02 petsvakala

I was wrong. This appears to be a gap in the local testing API. I'll see if support can be added by the next release.

sfc-gh-jrose avatar Feb 23 '24 20:02 sfc-gh-jrose

Ok, what close status should I choose for this issue for time being or I will keep it open?

petsvakala avatar Feb 26 '24 11:02 petsvakala

@petsvakala -- Can you retry with v1.14.0? It should be fixed now

sfc-gh-jfreeberg avatar Apr 11 '24 22:04 sfc-gh-jfreeberg

Hi Unfortunately still same behaviour.

Here you can see new package exists: poetry show snowflake-snowpark-python

 name         : snowflake-snowpark-python     
 version      : 1.14.0                        
 description  : Snowflake Snowpark for Python 

dependencies
 - cloudpickle >=1.6.0,<2.1.0 || >2.1.0,<2.2.0 || >2.2.0,<=2.2.1
 - cloudpickle 2.2.1
 - pyyaml *
 - setuptools >=40.6.0
 - snowflake-connector-python >=3.6.0,<4.0.0
 - typing-extensions >=4.1.0,<5.0.0

Here is minimum code for reproducability:

import pytest
import pandas as pd
import snowflake.snowpark.session as ses
from snowflake.snowpark.functions import current_date


@pytest.mark.data_processing
def test_calculate_rfm(request, session: ses.Session) -> None:
    if request.config.getoption('--snowflake-session') == 'local':
        from tests.patches import patch_current_date

    ID = ["A1", "A1", "A2"]
    ORDER_TOTAL = [50.0, 50.0, 80.0]
    dict = {'ID': ID, 'ORDER_TOTAL': ORDER_TOTAL}
    df = pd.DataFrame(dict)
    input_df = session.create_dataframe(df)
    
    # This only assign current date to first row
    snowpark_df = (input_df.with_column('CURRENT_DATE', current_date())).to_pandas()

    # here I create two row dataframe but only single row is returned
    snowpark_df2 = session.create_dataframe([[1, 'a', True], [3, 'b', False]]).select(current_date()).to_pandas()

    assert 1 == 1

This is how patching looks at the moment:

@patch(current_date)
def patch_current_date() -> ColumnEmulator:
    ret_column = ColumnEmulator(data=datetime.date.today())
    ret_column.sf_type = ColumnType(DateType(), True)
    return ret_column

petsvakala avatar Apr 12 '24 08:04 petsvakala

hey @petsvakala , we have added new features to our patching functions to pass length of rows, also now we have built-in mocking support for the current_date function so that you need to patch by yourself: https://github.com/snowflakedb/snowpark-python/blob/main/src/snowflake/snowpark/mock/_functions.py#L523-L528

could you try upgrading to the latest version of snowpark python and see if it helps resolve the issue?

sfc-gh-aling avatar Jul 22 '24 18:07 sfc-gh-aling

Hi @sfc-gh-aling , Yes now it seems to be working. Thank you again!

petsvakala avatar Aug 26 '24 12:08 petsvakala