xarray Why both "backend" and "engine"?

What is your issue?

I've always felt this was unnecessarily confusing. We have multiple "backends" that are selected through the engine kwarg to open_dataset, which ultimately calls an instance of a BackendEntrypoint subclass. Most of the internal implementation is not called Engine-anything, though we do have a function guess_engine.

Why not open_dataset(backend=...) or have an EngineEntrypoint internally?

It is probably too late to actually change either of these at this point though.

Sep 13 '24 00:09 TomNicholas

The "engine" name as a keyword argument on open_dataset() was copied from pandas.read_csv.

The "backend" name dates to very earliest days of Xarray's open source life (see @akleeman's https://github.com/pydata/xarray/commit/08f8f29736a3e18d90f9f287b85ea3d8a63c5064#commitcomment-4685056), when there was an idea that Dataset objects could be backed by netCDF and other file formats on Disk.

Sep 13 '24 01:09 shoyer

What I also find confusing is the generic term "backend" used here, especially that many things in Xarray may now be viewed as "backends": IO, duck array types, parallel computing frameworks, etc.

(Not a big deal either).

Sep 27 '24 12:09 benbovy

Yes, and the same for "engine" - if we were renaming everything today it would probably make most sense to call the ChunkManager (for different parallel execution frameworks) the "engine", as that is analogous to a SQL query execution engine.

Sep 27 '24 13:09 TomNicholas