pymc icon indicating copy to clipboard operation
pymc copied to clipboard

Use `narwhals` as dataframe-agnostic backend

Open jessegrabowski opened this issue 3 weeks ago • 2 comments

Description

Continues/completes #7462. I didn't have permission to push into that PR, so I'm opening this one.

The purpose of this PR is to use narwhals as a one-stop shop dataframe backend. Currently, we use pandas in data.py and pytensorf.py to allow users to pass dataframe objects into pm.Data and pt.as_tensor, respectively. I add a narwhals compatibility layer between the input and the pymc model to allow the user to bring his data in any form that narwhals supports, provided we register the libraries.

(If we could eliminate registration that would also be great, but I wasn't clever enough to figure out the multiple dispatch using only narwhals as a dependency. Maybe @MarcoGorelli could help 👉 👈 )

Some notes:

  1. Since generalized dataframes don't have a notion of an index, we don't look at the index to find the labels for the left-most dimension provided to pm.Data. Instead, we look for a column matching that dimension name. If it is found, it is used as labels, and excluded from the values.
  2. Narwhals has a lazy API via nw.LazyFrame and nw.LazySeries. I don't think we can do anything with those at the modeling level (maybe in the future with minibatching?). For now, I'm just calling .collect() on them to make them eager.
  3. As mentioned above, we don't get all of narwhals for free yet -- the PR as it currently stands forces us to register each backend library we want to support. I don't think this is so bad, because it forces us to write tests for say DuckDB if someone comes along and really wants that. But it's a bit ugly. I added a dask.dataframe backend as an example of how we could extend things.

As of this PR, pandas could be made an optional or dev-only dependency for us. I didn't do it right away because I wanted to take people's temperature on the idea.

Related Issue

  • [ ] Closes #7462 Closes #7463
  • [ ] Related to #

Checklist

Type of change

  • [x] New feature / enhancement
  • [ ] Bug fix
  • [ ] Documentation
  • [ ] Maintenance
  • [ ] Other (please specify):

jessegrabowski avatar Nov 17 '25 00:11 jessegrabowski