fix: unify `ColumnNotFound` for `duckdb` and `pyspark`
What type of PR is this? (check all applicable)
- [ ] 💾 Refactor
- [ ] ✨ Feature
- [x] 🐛 Bug Fix
- [ ] 🔧 Optimization
- [ ] 📝 Documentation
- [ ] ✅ Test
- [ ] 🐳 Other
Related issues
- Closes #2472
Checklist
- [x] Code follows style guide (ruff)
- [x] Tests added
- [x] Documented the changes
If you have comments or can explain your changes, please do so below
I think I can make some other clean-up of repetitive code. I'll try tomorrow morning
I made a followup PR #2495 with the cleanup :)
I finally had a few minutes to go back to this 🥲
Few notes:
-
catch_{}_exceptionmade it cleaner thanks for the tip -
sqlframeis tricky because it raises at collect time. Also the error will be different based on the backend. Do we have a way to know which type ofsqlframewe are dealing with. Forduckdbwe could use thecatch_duckdb_exceptionbut we need to make sureduckdbis available. Any ideas? Could we think about it in a follow up? - regarding
simple_select, it should already be covered by the below. We are already testing it too: https://github.com/narwhals-dev/narwhals/blob/b5f72dd36926d8be9506754cd9e92f8a855a42b8/narwhals/dataframe.py#L165-L181 Would you like to do something else for lazy backends?
sqlframeis tricky because it raises at collect time. Also the error will be different based on the backend. Do we have a way to know which type of sqlframe we are dealing with?
@EdAbati IIRC there's some import-related functions in _spark_like.utils that may be helpful for you?
I'm assuming this would work in a similar way, where you either need to use some Base* class or use the string in the concrete type in an import path
Found some time to update this. (sorry for the late reply)
@EdAbati IIRC there's some import-related functions in _spark_like.utils that may be helpful for you?
The problem IMO is that since sqlframe lets backend raise their errors we can only intercepts the ones of the backends we also support (i.e. pyspark and duckdb)
Not sure if the best solution would be to let sqlframe do their error handling or to intercept the errors just for the backend we support.
Maybe it should be discussed in an issue/follow-up?
@MarcoGorelli is there anything that you think is missing now? :)
thanks! i think the logic looks right, the tests look a little complex but maybe that's ok. will get back to this shortly
The problem IMO is that since sqlframe lets backend raise their errors we can only intercepts the ones of the backends we also support (i.e. pyspark and duckdb) Not sure if the best solution would be to let sqlframe do their error handling or to intercept the errors just for the backend we support. Maybe it should be discussed in an issue/follow-up?
Yeah that's fine - I think in general it's ok to aim for "we try to unify what we can, but there may be some differences that we have no control over"
Thank you @MarcoGorelli for the review and for the patience with this one 🥲❤️
@EdAbati @MarcoGorelli does this error seem related to this PR?
- https://github.com/narwhals-dev/narwhals/actions/runs/16328848063/job/46125919696
I haven't touched anything I swear 😂
does this error seem related to this PR?
- https://github.com/narwhals-dev/narwhals/actions/runs/16328848063/job/46125919696
yeah It looks like the "stricter" regex (that start with ^) are not ok for polars 0.20.22 :(
I have some time to update this tonight, I'll make a PR if no one does it before me :)