spikeinterface Force `_main_ids` to be strings

Related to the serialization thread (#1397), after discussing with @h-mayorquin, we thought that casting Base._main_ids as str always may solve several issues with keys not being json serializable. Allowing ids to be integers is mainly to maintain a similar behavior with spikeextractors, but now that ids and indices are clearly separated I think we can just use strings.

@samuelgarcia thoughts?

Mar 20 '23 08:03 alejoe91

I think this choice has to be unrelated to json stuff. I am not against having str only for ids. Having integer was also convinient in some cases. We have to see this like Dataframe.index which can be int, str and many more no ?

Lets maturate this idea more.

@magland : I think you use to have opinion on this : does channel_ids/unit_ids must str only or int/str ?

Mar 20 '23 13:03 samuelgarcia

I understand why forcing IDs to be str is desirable. However, I believe this would break many of the recordings/sortings that I work with, that use integer IDs. So not sure what is the best way forward here.

Mar 22 '23 00:03 magland

@magland could you elaborate why they would break? If you use, for instance, channel IDs to index or slice an array, that is not the way it should be because we have a clear api distinction between IDs and indices now. Do you have other breaking examples?

Mar 22 '23 06:03 alejoe91

I have a lot of existing code that uses integers for IDs when creating recordings/sortings. Frank lab also has recordings/sortings stored to disk in custom formats that I believe use integers for IDs.

Mar 22 '23 16:03 magland

For example, here's a line of code that I just wrote:

recording_signal = se.NumpyRecording([signal], sampling_frequency=recording_sub_whitened.sampling_frequency, channel_ids=np.arange(1, KK + 1, dtype=np.int32))

It would be cumbersome to convert that code to use strings instead of ints.

Mar 22 '23 18:03 magland

No we would do the conversion in the base object and under the hood. So if the downstream code does not depend on the IDs being int there's no problem

Mar 22 '23 18:03 alejoe91

So we'd be able to pass in ints for any of the API functions and they get translated to strings under the hood?

Mar 22 '23 18:03 magland

Well normally you would then the channel_ids from the object and pass them around!

Mar 22 '23 19:03 alejoe91

I believe this would still break my code in various places (some unexpected), like when I get the channel IDs and compare them to integers... or check to see if an integer is among the channel IDs. But if you decide to go that route, I guess I'll just have to deal with it. From a design perspective I think it is the right choice... but the fact is, it will cause some things to break for me.

Mar 22 '23 19:03 magland

I see! I agree it's better going forward. Probably also better to switch sooner than later!

Mar 22 '23 19:03 alejoe91

Maybe it should wise to deprecate explcitly in next release (actual master) and remove ids integer in the following one. So anyone could figure out which part of code will be broken in advance. This would be more smart no ?

Mar 22 '23 19:03 samuelgarcia

Maybe it should wise to deprecate explcitly in next release (actual master) and remove ids integer in the following one. So anyone could figure out which part of code will be broken in advance. This would be more smart no ?

I think that would be problematic, since you'd get a bunch of deprecation warnings for things that aren't really going to be problems. Like if I set the unit IDs as integers, that is still going to be supported according to @alejoe91. What you can't test for is how the user interprets the output of .unit_ids or .channel_ids.

Mar 22 '23 20:03 magland

Yep ;)

Mar 22 '23 20:03 alejoe91

I not sure to support the magical and hidden convertion : int to str in the base class. Magic stuff are futurs bugs!

Mar 22 '23 20:03 samuelgarcia

Now that I think more about this, I think the problem is not that int IDs are supported... the problem is that int IDs are used as the keys in dicts. A different solution would be for you to always use the stringified versions of IDs when creating dicts. When a user passes in a dict with ID keys, they need to make sure they are string keys. Similarly when a function outputs a dict, it needs to make sure the keys are strings. Then if a user reads a dict they would need to do X[str(id)] ...

Mar 22 '23 20:03 magland

I not sure to support the magical and hidden convertion : int to str in the base class. Magic stuff are futurs bugs!

If you don't support int's being passed in, it's going to break a lot of my existing code as well as in spyglass.

Mar 22 '23 20:03 magland

No worry, the idea is not to break codes! The key as int in dict is one aspect one the problem, the other is mainly to have something clear without ambiguity for a new user.

For me key as int and str is OK. Key as str only would be better. I think that key that are transformed along the way (in base or when making dict or whenever) is a bad solution.

So the 2 options I like are:

keep the actual behavior (int+str), we lived with this since a while without major drawback
str only everwhere since init of the class (but this break code)

All other solutions that make hidden and magic cast to str are a design error I think.

Mar 23 '23 07:03 samuelgarcia

Regarding json serialization, the following could be a better strategy

Replace things like this:

{
  1: [...],
  2: [...],
  ...
}

with things like this

{
  'locations': [
    {'id': 1, 'location': [...]},
    {'id': 2, 'location': [...]},
    ...
  }
}

Mar 23 '23 10:03 magland

This would require a big refactoring and I don't think it's needed. We have functions to make dictionaries serializable so it's now a solved problem. I think that by design, it's better to move forward with strings only (you would still be able to pass in integers in the Numpy constructor). I don't expect many things to break honestly, but we can make a separate branch that you can test against (same with spyglass). @khl02007 let us hear your opinion too!

Mar 23 '23 10:03 alejoe91

I guess we never did agree on this one, did we?

Can we do this with a very slow deprecation cycle. Start taking throwing deprecation warnings for ints when setting channel_ids and unit_ids?

May 03 '24 00:05 h-mayorquin

Honestly, having unit_ids as int for sorting is really really convinient in many sututation. For instance in Numpy.from_time_label where the labels become ids.

May 03 '24 07:05 samuelgarcia

Is it possible to introduce a getter _get_main_ids(dtype="int") so representation is str under the hood but can grab as int if required / make an easy find & replace on existing code?

May 03 '24 07:05 JoeZiminski

Honestly, having unit_ids as int for sorting is really really convinient in many sututation. For instance in Numpy.from_time_label where the labels become ids.

Can't we just use ids_to_indices there or is it more complicated than that?

May 03 '24 14:05 h-mayorquin

spikeinterface spikeinterface copied to clipboard

Force `_main_ids` to be strings

spikeinterface
spikeinterface copied to clipboard