snowpark-python icon indicating copy to clipboard operation
snowpark-python copied to clipboard

SNOW-1227759: ambiguous overload typing causing typing error false positives with DataFrameWriter.save_as_table()

Open TedCha opened this issue 11 months ago • 6 comments

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]

  1. What operating system and processor architecture are you using?

Windows-10-10.0.19045-SP0

  1. What are the component versions in the environment (pip freeze)?
asn1crypto==1.5.1
certifi==2023.11.17      
cffi==1.16.0
charset-normalizer==3.3.2
cloudpickle==2.2.1       
colorama==0.4.6
cryptography==41.0.7     
exceptiongroup==1.2.0    
filelock==3.13.1
idna==3.6
iniconfig==2.0.0
numpy==1.24.4
packaging==23.2
pandas==2.0.3
platformdirs==3.11.0     
pluggy==1.3.0
ply==3.11
pyarrow==14.0.2
pycparser==2.21
PyJWT==2.8.0
pyOpenSSL==23.3.0
pytest==7.4.4
pytest-mock==3.12.0
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3.post1
PyYAML==6.0.1
requests==2.31.0
six==1.16.0
snowflake-connector-python==3.6.0
snowflake-snowpark-python==1.11.1
sortedcontainers==2.4.0
sqlglot==22.2.0
sqlglotrs==0.1.2
tomli==2.0.1
tomlkit==0.12.3
typing_extensions==4.9.0
tzdata==2023.4
urllib3==1.26.18
  1. What did you do?

When the DataFrameWriter.save_as_table() method is called without the clustering_keys parameter, the pylance type checker will report the following error when trying to call the function:

No overloads for "save_as_table" match the provided arguments

Snippet:

def main(session: Session):
    test_df_1 = session.create_dataframe([])

    # No overloads for "save_as_table" match the provided arguments Argument types: (Literal['table_name'], Literal['temporary'])
    test_df_1.write.mode("overwrite").save_as_table(
        "table_name",
        table_type="temporary"
    )

    test_df_2 = session.create_dataframe([])
    
    # No error
    test_df_2.write.mode("overwrite").save_as_table(
        "table_name",
        table_type="temporary",
        clustering_keys=[]
    )

  1. What did you expect to see?

No type checking error when using save_as_table method as described in documentation.

  1. Can you set logging to DEBUG and collect the logs?

NA; static type checking.

Note:

I think this issue could be resolved by making the clustering_keys parameter optional on all overloads:

Ex:

    @overload
    def save_as_table(
        self,
        table_name: Union[str, Iterable[str]],
        *,
        mode: Optional[str] = None,
        column_order: str = "index",
        create_temp_table: bool = False,
        table_type: Literal["", "temp", "temporary", "transient"] = "",
        clustering_keys: Iterable[Column], # Change to Optional[Iterable[ColumnOrName]] = None
        statement_params: Optional[Dict[str, str]] = None,
        block: bool = True,
    ) -> None:
        ...  # pragma: no cover

    @overload
    def save_as_table(
        self,
        table_name: Union[str, Iterable[str]],
        *,
        mode: Optional[str] = None,
        column_order: str = "index",
        create_temp_table: bool = False,
        table_type: Literal["", "temp", "temporary", "transient"] = "",
        clustering_keys: Iterable[Column], # Change to Optional[Iterable[ColumnOrName]] = None
        statement_params: Optional[Dict[str, str]] = None,
        block: bool = False,
    ) -> AsyncJob:

TedCha avatar Mar 08 '24 15:03 TedCha

Hello @TedCha ,

Thank you raising the issue. I tried the code snippet provided by you in Jupyter notebook with Snowpark python 1.11.1, its working fine and no error being thrown, could you please check.

test_df_1 = session.create_dataframe([[1,2],[3,4]], schema=["a", "b"])

test_df_1.write.mode("overwrite").save_as_table(
        "table_name",
        table_type="temporary"
    )
session.table("table_name").collect()

Output: [Row(A=1, B=2), Row(A=3, B=4)]
test_df_2 = session.create_dataframe([[5,6],[7,8]], schema=["a", "b"])
test_df_2.write.mode("overwrite").save_as_table(
            "table_name",
            table_type="temporary",
            clustering_keys=[]
        )
session.table("table_name").collect()
Output: 
[Row(A=5, B=6), Row(A=7, B=8)]

Regards, Sujan

sfc-gh-sghosh avatar Mar 10 '24 07:03 sfc-gh-sghosh

Hello @sfc-gh-sghosh. No error is thrown at runtime, the error is thrown during static type checking. Please see the attached screenshot.

image

TedCha avatar Mar 11 '24 12:03 TedCha

Hello @TedCha ,

I tried using snowflake-connector-python 3.7.0 and Snowpark python 1.11.1, there is no static type checking error. Its running successfully at run time as well.

Could you please paste the full message and could you try fresh from another IDE such as Jupyter?

image Python_jupyter

Regards, Sujan

sfc-gh-sghosh avatar Mar 18 '24 05:03 sfc-gh-sghosh

Hello Sujan,

Thank you for looking into this issue. The issue is not an error happening at runtime, it is an error happening during static type checking before runtime.

Jupyter can not perform static type checking without having a language server installed. I was able to recreate the described issue in Jupyter Labs by installing the LSP integration for Jupyter and then installing the Pyright LSP. Pyright is the LSP that I am using in my IDE (VSCode) but this error could be in multiple Python language server protocols.

Please see attached screenshots for reference.

image image

TedCha avatar Mar 18 '24 14:03 TedCha

I believe https://github.com/snowflakedb/snowpark-python/issues/1058 is the same issue.

TedCha avatar Mar 19 '24 17:03 TedCha

Hello @TedCha ,

Thanks for the update, we are checking.

Regards, Sujan

sfc-gh-sghosh avatar Mar 20 '24 12:03 sfc-gh-sghosh