mne-python
mne-python copied to clipboard
Facilitate export to XArray
Describe the problem
XArray is very convenient when doing data exploration as it often allows for a more transparent code by using dimension names rather than indexes. It would be very useful if MNE was providing XArray export functions where it already provides exports to NumPy array. For example, whereas someone would currently use
ica.get_sources(epochs).get_data().mean(2).mean(0)
by adding a to_xarray() method, we could use the following alternative code
ica.get_sources(epochs).to_xarray().mean(["time", "epochs"])
# or ica.get_sources(epochs).to_xarray().mean("time").mean("epochs")
Personally, I found the second version much easier to understand. With the first version, I would typically have to print the shape of the data to infer what axis corresponds to what to understand what operation is being performed. The first form (with dimension indexes) is also much more error-prone because all sorts of things can affect the axes ordering whereas dimension names and coordinates generally remain consistent and generate much more transparent errors when they don't.
no objection. We have to_data_frame so we can add a to_xarray following the same approach (using nested import)
Or maybe add a kwarg to get_data, output='ndarray' (default) or 'xarray'. I think I'd prefer that to a new function
it's just harder to discover
Indeed, but less API / kwarg duplication
I’d prefer dedicated functions for each export. It’s how all other packages behave that I’ve ever encountered in my scientific Python stack.
Any objections to adding xarray in the requirements.txt? I am working on a different PR related to this https://mne.discourse.group/t/exporting-surface-sources-as-nifti-volume/5518/3 I'd used xarray in that PR, but that would need adding this dependency. I am thinking if we were to implement this feature request (I can do a PR for this), it would require adding this dependency anyway. I think XArray is a great component of Python scientific stack and worth integrating. I want to check first as I understand that there are also valid arguments for keeping dependencies as light as possible.
@christian-oreilly adding a dependency on the project is not cheap in terms of maintenance and technical debt. Can you motivate a bit this decision?
I don't think you should need xarray
to convert a surface to NIfTI at least
@agramfort Yes, I agree that adding dependencies must not be a rash decision. Some thoughts on that: 1 - The XArray export functionality would require importing XArray. You were thinking of it more as an extra/optional import required and checked for only when calling such an export function? 2 - I do use it for the intended PR for things like that:
source_data = xr.DataArray(stc.data, dims=["sources", "time"]) \
.rolling(time=time_downsample_factor, center=True) \
.mean()[:, time_downsample_factor//2:-time_downsample_factor//2:time_downsample_factor]
Of course, there is nothing there that cannot be implemented in plain Numpy or with Pandas (since in this specific context, I am only in 2D), but I think XArray significantly adds to readability as it allows operating on axes not using integers but with named dimensions. Compares
source_data.transpose("time", "sources")
source_data.sel(time=source_data.time > 0).mean("time")
with
source_data.transpose(1, 0):
source_data[:, time > 0].mean(1)
# Note also that here, time needs to be maintained
# as a separate variable. XArray can maintain it internally
# as the coordinates for the given dimension.
for example. The former is explicit, whereas the latter requires introspecting the objects to understand what is what. As much as Pandas is a game changer (compared to using Numpy only) when doing data analysis in general, XArray is a game changer when working with multidimensional data that can be represented as dense blocks (e.g., time X channel X epochs X ...); less so if not dense like if stacking data across subjects (e.g., time X channel X epoch X subject) where there is a large variability in the numbers of epochs per subject. 3 - XArray is an open door for using Dask (distributed computing) to scale up analyses toward "bigger" data applications. 4 - In the long term, I think mne could benefit from using XArray for some of its internal operations (to make them simpler and more readable).
Note that I have no particular involvement with XArray (i.e., I have no conflicting interests). I am just a happy customer :)
@larsoner Agreed. It is not a blocker for converting a surface to NIfTI. If available, I'll use it. If not, I'll get around. So this does not need to be settled for this specific PR.