pyedflib
pyedflib copied to clipboard
"sample_frequency" is a misleading name for the number of values in a signal
Hi, since v0.1.23 the getSampleFrequency method has changed the value it returns from the actual sample rate (aka the number of samples per second) to the number of samples in the signal.
This is also documented in code. I don't know the reason why it changed but there probably was a legit reason.
Now when I used the library I stumbled upon this and tracked down the value coming from getSampleFrequency to this line: https://github.com/holgern/pyedflib/blob/master/pyedflib/_extensions/_pyedflib.pyx#L358 which says:
return <double>self.hdr.signalparam[channel].smp_in_datarecord
So the actual C edflib interface calls this not sample frequency but smp_in_datarecord, which is a much more appropriate name IMO. I really struggled with my code and thought: Well I know I have a sample rate of 500 Hz in my data, but now the file says it is 5000 (my signals are 10 seconds), am I producing the wrong files? (sidenote, I only read BDF files with pyedflib, I'm not writing them with this lib)
This confusion IMO comes solely from the name. I'm happy to calculate the frequency (or rate or however you would like to call it) myself, but I'm 100% sure that someone else later in the code will see my calculation and say "hey there is this getSampleFrequency method provided out of the box, let's use this".
What do you think about changing the name of the attribute altogether and removing any ambiguities?
I just found Py-EDFLib, which seems to be doing the old calculation (e.g. seeing frequency of a number occurence per time interval): https://gitlab.com/Teuniz/EDFlib-Python/-/blob/master/src/EDFlib/edfreader.py#L509
I stumbled upon this issue as I fell for the same problem after an update to 0.1.23
I think @gregmuellegger 's point is valid. I looked it up in the official spec (which I luckily never had to bother with so far) and there this attribute in the header is called nr of samples in each data record too.
IMO getSampleFrequency now does not anymore do what its name advertises.
This is related to #121 and #117
I totally understand the confusion about the two terms, and I agree that it should be made clearer. In my opinion, sample frequency is colloquially understood as samples per second, and should also in pyedflib be treated as such. We need to think about how to implement this without making breaking changes again.
The underlying C-library edflib names it sample_frequency but means smp_per_record. At the moment we call edf_set_samplefrequency which simply sets the smp_in_datarecord parameter in the C code. I think we should deviate from the way edflib names it and use SampleFrequency the way it is usually understood (i.e. per second) and additionally make a method to set/get the smp_per_record attribute. However, there is also datarecord_duration, and I need to look further into how these two are actually combined when writing/reading the file. Nevertheless, the two should be set-able independently.
Current goal should be:
- getSampleFrequency should return the calculated sample frequency (smp/second) per channel
smp_per_recordandrecord_durationshould be calculated in the background accordingly.
Additionally, these changes need to be accurately reflected when writing new files that have a record length of !=1second, and in highlevel.write_edf the block_size parameter should be renamed to accordingly reflect this change.
@holgern what are your thoughts on this?
My two cents: In my library, edflib, I did the calculation to give the conventional use of sample frequency
def samplefrequency(self, channel):
return (<double>self.hdr.signalparam[channel].smp_in_datarecord / self.hdr.datarecord_duration) * EDFLIB_TIME_DIMENSION
@cleemesser yes, that is also how we will calculate it once #159 is reviewed and merged

btw: I still encourage you to join in on pyedflib instead of working on your fork :) I think pooling resources will be much more efficient than working on two separate projects
Thank you @skjerns for taking time and effort to fix this. 👏