PySpark - Incompatible parameter type & Unsupported operand

Open tomas-pihrt opened this issue 2 years ago • 1 comments

Pyre Bug

Bug description

The pyspark dataframe functions generate Unsupported operand [58] and Incompatible parameter type [6] even though they are valid and even suggested in the Spark documentation.

Reproduction steps

Python snippet sample.py:

from pyspark.sql import SparkSession, functions as f

spark = SparkSession.builder.getOrCreate()

df = spark.sql("select 1 as num")

(
    df
    .withColumn("num", f.col("num") + 2)
    .withColumn("num", f.col("num") - 2)
    .withColumn("num", f.col("num") * 2)
    .withColumn("num", f.col("num") / 2)
    .filter(f.col("num") > 1)
    .filter(f.col("num") >= 1)
    .filter(f.col("num") < 1)
    .filter(f.col("num") <= 1)
).show()

Expected behavior

Running the pyre check should not throw any issues as the code is valid.

See the docs:

Logs

$ pyre check

ƛ Found 16 type errors!

sample.py:9:23 Unsupported operand [58]: `+` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:9:23 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.withColumn`, for 2nd positional argument, expected `Column` but got `int`.
sample.py:10:23 Unsupported operand [58]: `-` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:10:23 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.withColumn`, for 2nd positional argument, expected `Column` but got `int`.
sample.py:11:23 Unsupported operand [58]: `*` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:11:23 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.withColumn`, for 2nd positional argument, expected `Column` but got `int`.
sample.py:12:23 Unsupported operand [58]: `/` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:12:23 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.withColumn`, for 2nd positional argument, expected `Column` but got `float`.
sample.py:13:12 Unsupported operand [58]: `>` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:13:12 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.filter`, for 1st positional argument, expected `Union[Column, str]` but got `bool`.
sample.py:14:12 Unsupported operand [58]: `>=` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:14:12 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.filter`, for 1st positional argument, expected `Union[Column, str]` but got `bool`.
sample.py:15:12 Unsupported operand [58]: `<` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:15:12 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.filter`, for 1st positional argument, expected `Union[Column, str]` but got `bool`.
sample.py:16:12 Unsupported operand [58]: `<=` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:16:12 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.filter`, for 1st positional argument, expected `Union[Column, str]` but got `bool`.

pyre_rage.log

Sep 26 '23 11:09 tomas-pihrt

As my inspection, a part(or even all) of this issue caused by pyre only treat a function defined by def __add__(): ... as a magic method of a class but Column defines its __add__ and many other magic methods like

__add__ = cast(
    Callable[["Column", Union["Column", "LiteralType", "DecimalLiteral"]], "Column"],
    _bin_op("plus"),
)

(see https://github.com/apache/spark/blob/187e9a851758c0e9cec11edab2bc07d6f4404001/python/pyspark/sql/column.py#L235-L274)

Here is a more understandable demo for this: source:

class MyInt:
    int_val: int = 0

    def real_add_func(self, other_int: int) -> int:
        return self.int_val + other_int

    __add__ = real_add_func


a_int: int = MyInt() + 0

pyre_playground here Pyre gives a false positive 10:13: Unsupported operand [58]: `+` is not supported for operand types `MyInt` and `int`.

Sep 29 '23 00:09 WangGithubUser