pyspark-style-guide icon indicating copy to clipboard operation
pyspark-style-guide copied to clipboard

This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.

Results 5 pyspark-style-guide issues
Sort by recently updated
recently updated
newest added

**pylint** 2.7.2 **astroid** 2.5.1 **Python** 3.9.2 (default, Feb 24 2021, 13:26:01) When running pylint with those checkers enabled and with a file that has a unnecessary split statement, I encounter...

@asmello as discussed, it's better style to write complex filters: ```python df.where(F.col('pokemon').isNull() & ~F.col('cards').isNull()) ``` Rather than chain filters: ```python df.where(F.col('pokemon').isNull()).filter(~F.col('cards').isNull()) ``` (Taking into account https://github.com/palantir/pyspark-style-guide#refactor-complex-logical-operations)

To be consistent with the concept of selecting first the input needed for a transform, should we also recommend doing that before a join . This would mean the good...

When defining a function, it would be useful to follow a convention for PySpark DataFrame typehints, e.g. ``` from pyspark.sql import DataFrame import pyspark.pandas as ps def my_function(my_dataframe: DataFrame) ->...