Velcolfix branch (for velocity columns memory uninitialization? in linux version 4.2.1197)
This does several things:
- clarifies the meaning of the various sample columns while they are being read in edf_read.py from the edf_get_float_data() by reading them into a separate Nx1 array linked to the column name key in a dictionary rather than an NxC numpy ndarray. dtypes are the native dtypes returned by the FSAMPLE struct, rather than the all-float version previously returned. One can afterwards run "convert_dtypes" to convert to more robust datatypes if necessary.
- fixes the number_elements problem (already fixed in main, but necessary to fix again if reading method is changed) by resizing each of the Nx1 arrays to the number of samples read.
- Adds an "exclude_vel_cols" option in parse() to read_edf, default True, which sets all velocity-related sample columns to NaN. Does not effect events. This is to temporarily address the seeming memory uninitialization/struct offset error (cause uncertain) observed in linux version 4.2.1197 at least. I am not sure how to add "tests" for this in pytest since it requries running multiple processes and is nondeterministic... I added several directories to the tests/ directories which contain python scripts to check this (in R using eyelinkReader, and python using pyedfread)
- Adds an "ftime" option to read_edf, which converts the "Time" column to floating point (from default uint32) and adds 0.5 seconds for appropriate rows based on flag. I suppose I need to add a test for this, which requires adding an EDF file recorded at 2000 Hz and a way of identifying what is correct (via e.g. ASC from edf2asc). I will prepare this later...
Probably better to wait on what Sam and SR research say, this is a temporary fix at best (although it includes many other updates to the code).
Sam has said that it is indeed on their end, and they will release a new version. I will get a testing version and start testing. But, we should add a "version" check to our code, and for any version equal to or earlier than the current (buggy) version, we should provide NaNs for the affected columns.
Do you have windows environment to test with? I have never used the windows version of EDFAPI, but checking may be recommended...
Great! Yes I have a windows environment to test (and Ubuntu, but you got that covered)
OK great. Are you able to run and recreate the behavior I observed on windows (i.e. the output does not match in the VEL columns over mutliple runs?
Is the version number of the windows version the same as the Ubuntu version? (that 4.2.11... number?)
I think the criticial result I found is that the velocity columns are broken anyway
I found a version 4.2.762.0 for the edfapi.dll - not entirely sure how to find out the version
OK so on windows you found some file named 4.2.762.0?
Sorry, please clarify -- You found the velocity columns are "broken anyway"? What do you mean?
You mean they contain weird values? (this is issue #1).
The other issue is that the weird values are not stable across different repetitions for the same EDF file (issue #2).
Yea, I think we can wrap the "version" function and see if that gives useful information in our library. And if that is properly updated we can use that. Because there is no way to diagnose if the data is safe based on behavior since it is effectively random values in memory.
Honestly probably the safest thing to do is simply blank out affected columns with NANs. That way we guarantee that no one gets "incorrect" data, which to me is worse than no data. In fact it is better to simply not include the columns at all.
On windows I found the edfapi.dll and in the properties it says that version number :shrug:
For me these two issues (broken values and non-repeatable) are the same underlying issue, e.g. the second follows from the first. Sorry for not being clear. I didnt check the repeatedness yet, bit busy unfortunately right this moment.
Detecting the version + filling with NaNs + Warning seems best! Not sure I'd remove the columns though, that would be breaking
hey Richard, could you quickly state if this should be merged or not? Sam from SR-Research told me the bug should be fixed in newer SDK versions, but I didnt check yet
Sorry, the bug is for the velocity columns, but the original issue in pyedfread (reading too many tokens/rows) was not fixed by the SDK update (it was an error in pyedfread, not the edfapi lib).
This update adds some default arguments to parse(), so it results in the same original behavior by default, but also gives options to exclude velocity data if you know your version is broken etc.
But, this is not clean code, if you will merge to the mainline, let me do a clean checkout and fork and only one checkin.