[Bug]: `SparkLikeLazyFrame.rename`cannot handle periods in names
Describe the bug
hey everyone,
not sure if this has been discussed elsewhere but ran into an issue renaming a Spark DataFrame with periods in its columns (a somewhat common issue).
historically I've resolved this using .withColumnsRenamed() but noticed that narwhals is using .select() - either approach works with some minor differences.
wondering if we could do one of two things:
- replace
selectwithwithColumnsRenamed-rename_mappingremains unchanged - surround the keys in
rename_mappingwith backticks -self.native.selectremains unchanged
https://github.com/narwhals-dev/narwhals/blob/ec5f4967bf7ea3513a9d0fa1cc837985b612e0f7/narwhals/_spark_like/dataframe.py#L358-L366
Steps or code to reproduce the bug
create a PySpark DataFrame with a column that has a dot in its name
import pandas as pd
from sqlframe.spark import SparkSession
spark = SparkSession.builder.getOrCreate()
temp = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]})
spark_dataframe = spark.createDataFrame(temp).withColumnRenamed("c", "c.d")
spark_dataframe.columns
#>: ["a", "b", "c.d"]
try to rename the column using narwhals
import narwhals as nw
narwhals_dataframe = nw.from_native(spark_dataframe)
mapping = {column: column.replace(".", "_").upper() for column in narwhals_dataframe.columns}
renamed = narwhals_dataframe.rename(mapping)
renamed.to_native().show()
#>: AnalysisException
Expected results
narwhals_dataframe.columns
#>: A, B, C_D
Actual results
raises an analysis exception
AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `c`.`d` cannot be resolved
Please run narwhals.show_version() and enter the output below.
System:
python: 3.12.11 (main, Jun 4 2025, 08:56:18) [GCC 11.4.0]
executable: /usr/bin/python3
machine: Linux-6.1.123+-x86_64-with-glibc2.35
Python dependencies:
narwhals: 2.3.0
numpy: 2.0.2
pandas: 2.2.2
modin:
cudf: 25.6.0
pyarrow: 18.1.0
pyspark: 3.5.1
polars: 1.25.2
dask: 2025.5.0
duckdb: 1.3.2
ibis: 9.5.0
sqlframe: 3.40.2
Relevant log output
---------------------------------------------------------------------------
AnalysisException Traceback (most recent call last)
/tmp/ipython-input-3169437464.py in <cell line: 0>()
11 narwhals_dataframe = nw.from_native(spark_dataframe)
12 mapping = {column: column.upper() for column in narwhals_dataframe.columns}
---> 13 narwhals_dataframe.rename(mapping).to_native().show()
7 frames
/usr/local/lib/python3.12/dist-packages/pyspark/errors/exceptions/captured.py in deco(*a, **kw)
183 # Hide where the exception came from that shows a non-Pythonic
184 # JVM exception message.
--> 185 raise converted from None
186 else:
187 raise
AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `c`.`d` cannot be resolved. Did you mean one of the following? [`t32279404`.`a`, `t32279404`.`b`, `t32279404`.`c.d`].; line 1 pos 293;
'WithCTE
:- CTERelationDef 9, false
: +- SubqueryAlias t28915597
: +- Project [cast(a#49 as bigint) AS a#42L, cast(b#50 as bigint) AS b#43L, cast(c#51 as bigint) AS c#44L]
: +- SubqueryAlias a11
: +- LocalRelation [a#49, b#50, c#51]
:- CTERelationDef 10, false
: +- SubqueryAlias t32279404
: +- Project [a#42L, b#43L, c#44L AS c.d#45L]
: +- SubqueryAlias t28915597
: +- CTERelationRef 9, true, [a#42L, b#43L, c#44L], false
:- 'CTERelationDef 11, false
: +- 'SubqueryAlias t12673183
: +- 'Project [a#42L AS a#46L, b#43L AS b#47L, 'c.d AS c.d#48]
: +- SubqueryAlias t32279404
: +- CTERelationRef 10, true, [a#42L, b#43L, c.d#45L], false
+- 'GlobalLimit 20
+- 'LocalLimit 20
+- 'Project ['a AS A#39, 'b AS B#40, '`c.d` AS C.D#41]
+- 'SubqueryAlias t12673183
+- 'CTERelationRef 11, false, false
thanks @lucas-nelson-uiuc for the report! our minimum pyspark version is 3.4.0 so happy to use withColumnsRenamed
@MarcoGorelli sorry I might have re-opened this too soon. I thought it was a mistake since the PR is only adding a test, but the test is not failing without editing the codebase?
Edit: Nevermind, issue is from dotted to whatever
thanks @FBruzzesi
yup - @lucas-nelson-uiuc as noted in the linked PR, we can't use withColumnsRenamed here unfortunately
i'd suggesting renaming outside of Narwhals for now
The thing is that this is not a rename only issue. select and with_columns break as well:
import narwhals as nw
import pandas as pd
from sqlframe.spark import SparkSession
spark = SparkSession.builder.getOrCreate()
temp = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]})
spark_dataframe = spark.createDataFrame(temp).withColumnRenamed("c", "c.d")
nw.from_native(spark_dataframe).select("c.d").collect("pandas")
AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name
c.dcannot be resolved. Did you mean one of the following? [t33636136.a,t33636136.b,t33636136.c.d]. SQLSTATE: 42703; line 1 pos 251;
@MarcoGorelli one option to "fail early" is to do a check when we wrap a dataframe in the same way we check for duplicate column names in pandas. In that way we can have a bit more control and suggest what to do rather than failing later on.