sedona icon indicating copy to clipboard operation
sedona copied to clipboard

ST_Transform() does not produce the same results as geopandas.to_crs()

Open ameliaholcomb opened this issue 4 months ago • 2 comments

Using sedona 1.7.2, pyspark 3.5.5, and geopandas 0.14.4, I get different results from ST_Transform() and geopandas.to_crs().

To reproduce, see below:

# EASE 2.0 Grid (Lambert Cylindrical Equal Area)
EASE_2_WKT = """
PROJCS["WGS 84 / NSIDC EASE-Grid 2.0 Global",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84", 6378137, 298.257223563, 
                AUTHORITY["EPSG", "7030"]],
            AUTHORITY["EPSG", "6326"]],
        PRIMEM["Greenwich", 0, AUTHORITY["EPSG", "8901"]],
        UNIT["degree", 0.0174532925199433, 
            AUTHORITY["EPSG", "9122"]],
        AUTHORITY["EPSG", "4326"]],
    PROJECTION["Cylindrical_Equal_Area"],
    PARAMETER["standard_parallel_1", 30],
    PARAMETER["central_meridian", 0],
    PARAMETER["false_easting", 0],
    PARAMETER["false_northing", 0],
    UNIT["metre", 1, AUTHORITY["EPSG", "9001"]],
    AXIS["Easting", EAST],
    AXIS["Northing", NORTH],
    AUTHORITY["EPSG", "6933"]]
"""

def get_sedona():
    config = (
        SedonaContext.builder()
        .config(
            "spark.jars.packages",
            "org.apache.sedona:sedona-spark-3.5_2.12:1.7.2,"
            "org.datasyslab:geotools-wrapper:1.7.2-28.5,",
        )
        .config(
            "spark.jars.repositories",
            "https://artifacts.unidata.ucar.edu/repository/unidata-all",
        )
        .master("local[4,0]")
    ).getOrCreate()
    return SedonaContext.create(config)

def main():
    spark = get_sedona()

    xs = [41.57587966, 41.57617993, 41.57647863]
    ys = [-76.69285786, -76.69230338, -76.69175185]
    gdf = gpd.GeoDataFrame(
        {"geometry": gpd.points_from_xy(xs, ys)},
        crs="EPSG:4326",
    )
    # geopandas supports code "EPSG:6933" natively
    print(gdf.to_crs("EPSG:6933"))
    # passing the WKT explicitly produces the same results
    print(gdf.to_crs(EASE_2_WKT))

    sdf = spark.createDataFrame(gdf)
    sdf.createOrReplaceTempView("points")

    # this does not produce the same results
    sdf = spark.sql(
        f"SELECT ST_Transform(geometry, 'EPSG:4326', '{EASE_2_WKT}') as geometry_6933 FROM points"
    )
    print(sdf.show(truncate=False))

This prints:

                           geometry
0  POINT (4011501.977 -7143390.895)
1  POINT (4011530.948 -7143374.405)
2  POINT (4011559.769 -7143358.002)
                           geometry
0  POINT (4011501.977 -7143390.895)
1  POINT (4011530.948 -7143374.405)
2  POINT (4011559.769 -7143358.002)
+---------------------------------------------+
|geometry_6933                                |
+---------------------------------------------+
|POINT (4628205.753033619 -6191541.2203990985)|
|POINT (4628239.178937121 -6191526.927389243) |
|POINT (4628272.43006902 -6191512.709843179)  |
+---------------------------------------------+

ameliaholcomb avatar Aug 16 '25 19:08 ameliaholcomb