scikit-fda icon indicating copy to clipboard operation
scikit-fda copied to clipboard

`FDataGrid.copy` argument `sample _names` default behaviour

Open ego-thales opened this issue 2 years ago • 2 comments
trafficstars

Hello,

I sometimes use a "skeleton" for a FDataGrid instance, containing no samples but lots of information (grid points, domain_range, interpolation, etc...).

I would like, given a new data_matrix (generated from another process), to create an instance following the skeleton, with:

fd = skeleton.copy(data_matrix=data_matrix)

but the problem with that, is that copy isn't happy with the fact that I don't give sample names.

I can easily fix this with

fd = skeleton.copy(data_matrix=data_matrix,
                   sample_names=skeleton.sample_names + (None,) * len(data_matrix))

but I think the default behaviour of copy might be changed for a smoother use. What do you think? Another possibility could be to turn to concatenate and allow it to be used with a data_matrix argument and not only FDataGrid instances.

ego-thales avatar Oct 24 '23 09:10 ego-thales

I wanted since a long time to rework the classes used to represent functions, in order to make them more general and move them to a separate package. Thus, some of the design choices taken for scikit-fda will be revisited.

In particular, I agree that metadata attributes, such as sample_names complicate a lot of things. On one hand, they are interesting for plotting and exploratory analysis. On the other hand, they work really bad with operations. For example, when taking the mean of a functional dataset, the sample_names need to be removed. This is not a new problem. Other libraries, such as xarray or scipp that deal with arrays including labels or units need to do something about it too. We should check what they do.

As for the copy method itself, I am unsure of the best way to deal with it. Right now it is more of an "utility" method, used for creating identical or almost identical copies of a FData object, that is why by default it tries to preseve everything. However I agree that this is problematic, and can create subtle bugs if one forgets about that behavior. I would love to have your feedback on how to better implement it.

vnmabus avatar Oct 24 '23 09:10 vnmabus

Other libraries, such as xarray or scipp that deal with arrays including labels or units need to do something about it too. We should check what they do.

I personally don't know these packages at all, so I would not be able to give you any insight at the moment.

As for the copy method itself, I am unsure of the best way to deal with it. [...] I would love to have your feedback on how to better implement it.

I don't have any strong opinion, but my main observation would be that it is currently impossible to set any argument to attr=None, because it is then treated as self.attr. I think this should be fixed, while retaining default behaviour.

skeleton.copy(data_matrix=data_matrix)  # Raises error if data_matrix has different number of samples
skeleton.copy(data_matrix=data_matrix, sample_names=None)  # Actively deleting sample names

This way, no sample name is silently deleted, but if the user encounters this error, she/he can fix it more easily.

As for implementation detail, I don't know what the best practise would be. Creating a DEF object to replace the default None values?

def copy(self: T, *, data_matrix=DEF, grid_points=DEF, [...]) -> T

There might be more pythonic ways I don't know of.

eliegoudout avatar Nov 09 '23 22:11 eliegoudout