wfdb-python
wfdb-python copied to clipboard
Better API for specifying return data of signals
The current rdrecord
API is rather confusing when dealing with more complex records/signals. Namely: multi-frequency signals.
We should create a better API to allow users to:
- Specify/know the exact return types/dimensions of the signals being read
- Deal with single and multi-frequency records consistently.
In addition, the current API is also limiting when reading single-frequency records, as there is no way for users to explicitly specify whether they want 1d or 2d numpy arrays.
We current have: https://github.com/MIT-LCP/wfdb-python/blob/0d42dfb4b2946625f00cbf500d830d374a201153/wfdb/io/record.py#L1715 where the only parameter that influences the return type is: smooth_frames
We've chatted about the issue here, where I suggested a new set of params: https://github.com/MIT-LCP/wfdb-python/pull/313#issuecomment-1058363761
What do people think of the suggestion of having two params: return_dims
and smooth_frames
? return_dims
will apply for both single and multi-frequency signals, and smooth_frames
will apply only to multi-frequency signals. Given the combination of the params, the return value of the signals will be:
(1, True): List of 1d arrays. Guaranteed to be same length. (1, False): List of 1d arrays. Not guaranteed to be same length for multi-frequency signals. (2, True): Single 2d array. (2, False): Illegal combination.
@tompollard @bemoody @alistairewj thoughts?
As an application programmer, I want the data. I actually don't care what data structure is used to hold it, as long as the structure is consistent for all input records, and as long as it isn't grossly inefficient.
If you give me an NxM array, and what I really want is an M-element list of N-element arrays, I can very easily call list(x.transpose()) myself. As a library programmer, I think it's better to put that sort of trivial burden onto applications, rather than making the library interface more complicated.
In any case, there are several more combinations of arguments you didn't mention:
- return_dims=1 and smooth_frames is unspecified
- return_dims=2 and smooth_frames is unspecified
- smooth_frames=True and return_dims is unspecified
- smooth_frames=False and return_dims is unspecified
- neither return_dims nor smooth_frames is specified
Thanks. Some more thoughts:
- Many users will want the 2d array to pass into convolutional neural networks or general multi-channel filter functions.
- For multi-frequency records, the only way to avoid losing information is to return an array of 1d arrays.
I think it's worth it for our library to support both dimensions, despite the additional complexity. Regarding efficiency/performance, would either form perform a lot better?
I imagine for the ease of writing the functions, but at a large performance cost, we do have the option of always processing the signals as a list of 1d arrays, and then only reshaping into 2d before returning the value.
The function argument values can be required, and have defaults, so that we don't have to deal with the unspecified cases.
One-dimensional CNNs, yeah. A two-dimensional CNN (and a two-dimensional input array) would only make sense if the channels were ordered spatially, right? But I'm not here to argue with people's life choices. :)
There surely are many applications where storing each channel in contiguous memory could give slightly better performance than interleaving the channels. I believe numpy allows you to create arrays in "Fortran order" rather than "C order", so this wouldn't require any API change.
The function argument values can be required, and have defaults, so that we don't have to deal with the unspecified cases.
Well, that's the trouble. The way you've outlined it, there isn't any default value for return_dims that makes sense. To be backward compatible, return_dims would have to be "1 if smooth_frames is false, 2 if smooth_frames is true".
As a database author, I care a lot about the defaults. I think it's important that the library should expose high-resolution data by default. I don't want to have to keep explaining to people that they need to pass a mysterious extra argument to prevent the library from smudging the data. There are a few ways this could be done:
-
The library could provide both multi-frequency (e_p_signal) and low-resolution smooth-frames (p_signal) data by default (and I could tell people "just use e_p_signal and forget p_signal exists.")
-
The library could provide only multi-frequency data by default.
-
The library could provide high-resolution smooth-frames data (upsampling, not downsampling.)
However, I think the first option is the only way to really be backward compatible. There's no way to be backward compatible and provide e_p_signal and p_signal simultaneously and make p_signal high-res instead of low-res.
Let's not worry about backwards compatibility here. The current API behavior is not ideal, and does not allow the user to get a predictable output.
I didn't think about the high-res option. I'm not sure how useful it would be, but we could provide it. We would then have combinations of return_dims
and smooth_frames
(or better name):
- (1, None): List of 1d arrays. Not guaranteed to be same length for multi-frequency signals.
- (1, 'average'): List of 1d arrays. Guaranteed to be same length.
- (1, 'upsample'): List of 1d arrays. Guaranteed to be the same length. (For
Record
objects, what happens tofs
?) - (2, None): Illegal combination.
- (2, 'average'): Single 2d array.
- (2, 'upsample'): Single 2d array.
We could also add an INFO logging statement that occurs when reading a multi-frequency record, so that it's more immediately obvious.
I realize that all of this only considers the final output array(s) (such as when calling the current rdsamp), not the intermediate steps and the object attributes used to represent the signals. Let me think about that.
I didn't think about the high-res option. I'm not sure how useful it would be, but we could provide it.
I'm honestly not sure how useful it would be either... but as you say, people like having the 2D array, and if someone insists on having a 2D array, it makes much more sense to upsample rather than downsampling.
I realize that all of this only considers the final output array(s) (such as when calling the current rdsamp), not the intermediate steps and the object attributes used to represent the signals.
I don't use wfdb.rdsamp, and I'd encourage anyone still using it to use wfdb.rdrecord instead. wfdb.rdsamp always returns a two-dimensional array, and I don't think it would be useful to change that behavior now.
rdrecord does not return an array or a list. It returns a Record object. A Record object has numerous attributes and methods. That means there's no reason a Record object cannot simultaneously provide access to many different views of the same data.
At the same time, I'm still not clear why it would be beneficial to provide data as a list-of-equal-length-arrays. I see that as being redundant (it's not any more efficient, and it's less useful for some applications, than a 2D array), and possibly confusing (at present if a Record has an e_p_signal then you know it hasn't been resampled.)
An idea I mentioned in the meeting today was to allow the p_signal
(and e_p_signal
, etc.) attributes to be lazily loaded, perhaps if you call rdrecord
with stream=True
or something like that.
A very rough sketch might look like this:
def rdrecord(record_name, sampfrom=0, sampto=None, stream=False):
record = rdheader(record_name)
if sampto is None:
sampto = record.sig_len
if stream:
record.p_signal = Lazy2DSignalArray(record, sampfrom, sampto)
else:
record.p_signal = _read_2d_signals(record, sampfrom, sampto)
return record
class Lazy2DSignalArray:
def __init__(self, record, sampfrom, sampto):
self._record = record
self._base = sampfrom
self._len = sampto - sampfrom
def __len__(self):
return self._len
def __getitem__(self, key):
if isinstance(key, slice):
(start, stop, step) = key.indices(self._len)
data = _read_2d_signals(self._record,
start + self._base,
stop + self._base)
return data[::step]
elif isinstance(key, tuple):
return self[key[0]][key[1:]]
elif hasattr(key, '__index__'):
return self[key:key+1][0]
else:
raise TypeError
Just catching up on this!
Let's not worry about backwards compatibility here. The current API behavior is not ideal, and does not allow the user to get a predictable output.
What about deprecating rdrecord
and introducing wfdb.read_record()
or maybe just wfdb.read()
?
What do people think of the suggestion of having two params: return_dims and smooth_frames? return_dims will apply for both single and multi-frequency signals, and smooth_frames will apply only to multi-frequency signals.
Is return_dims
necessary at all? The fact that it leads to situations like (2, False): Illegal combination.
isn't nice. Could we treat everything as 2d and maybe have some kind of binary flatten
flag if necessary?
What about deprecating
rdrecord
and introducingwfdb.read_record()
or maybe justwfdb.read()
?
That's the long term plan. We want a major version where old and new APIs are available, and then another one that removes the old ones.
Is
return_dims
necessary at all? The fact that it leads to situations like(2, False): Illegal combination.
isn't nice. Could we treat everything as 2d and maybe have some kind of binaryflatten
flag if necessary?
flatten
essentially does the same thing as return_dims
and the illegal combination issue would still hold. I'd argue that it's a worse keyword to use, since flattening an array may mean different things, thus making the keyword less explicit.
Hot off the press: https://github.com/MIT-LCP/wfdb-python/pull/388
See the top level functions at the bottom of the module demonstrating a non-streaming and streaming (needs a lot of work) API. My biggest takeaway while playing around with this: Why does any class/object need to have the array as one of its attribute? Does it benefit the library, or the user? If we just separate the full data array from the class/object, we can remove the p_signal/d_signal/e_p_signal naming garbage.
^^ Thinking about it more, separating the signal data from the object may bring more confusion than benefits. Imagine reading a JPEG file into an Image
object, and not having the bytes as an object field. That would be very strange.