mne-hcp
mne-hcp copied to clipboard
ENH: improve test coverage and speed
Currently adding CIs may be difficult, but a first start would be to have unit tests that assume consistent local data paths. My suggestion would be to test with one subject that has all relevant data and assume that all data are unpacked correclty.
Covarage is at 58 percent or so. Clearly room for improvement.
One important goal is to figure out how to increase coverage while downloading fewer data ... currently we risk to hit disk space limits. I therefore turned off the Python 2 build and downloaded fewer raw data, since testing the raw IO does not depend much on the data_type
, a big proportion of epochs IO testing is about task specific timings and so on.
@Eric89GXL I created this https://github.com/dengemann/make-mne-hcp-testing-data.git to make light-weight testing data. With the parameters found there we can shrink the data from 23GB to 500MB. However I'm fighting with some failing tests. There is a section on "weird bug" for reconstructed times that exposes some of the issue. It seems we run into problems with the current way mne python takes care of constructing epochs.times from tmin, data.shape[2] and sfreq. All is converted to the sample domain via rounding and then converted back to times. With the data we find in the matfile we would expect somethink like np.linspace(tmin, tmax, data[shape]) which works reasonably well. Is this an MNE issue or do you see any obvious other issue here?
@Eric89GXL the current "fix" would be to overwrite epochs.times with times from the matfile after construction of the Epochs.
np.linspace
is going to be the wrong thing to do unless a similar np.linspace(...)
was used to subselect data points from the array, which is weird, but possible.
... and doing the np.linspace
approach you'd probably end up with some strange sfreq
... and if you actually did it that way, you'd probably end up needing to do a round
of the linspace
, which would then make it wrong again. Yuck
@Eric89GXL any idea what could be the source of the trouble here? maybe indeed the way HCP has decimated their data (factor 4).
They must decimate using some other method. Is their code available for you to look at?
@dengemann I'm surprised that the hack I suggested did not work. Just looking at the private repo we have on github (for autoreject) I can find something like this:
tmin = -1.5
tmin = tmin + 1 / evoked.info['sfreq']
evoked.crop(tmin=tmin)
If the tmin
is the same and the sfreq
is the same, I do not understand why you would not have the same times
....
Once this works, we can look into the decimation problem. I understand that there are two confounding problems - decimation and tmin
Thanks @jasmainak, i'll move this to another discussion. In the meantime I forced our readers to use the matlab times vectors to be consistent with the HCP data.
So the good news is: we've got fast travis builds (30 seconds). This means dev should be fun now!
https://travis-ci.org/mne-tools/mne-hcp/builds/185756666
Now that things go fast, we can get back to the remaining aspect of this issue. Our test coverage has decreased as far as I can see, as a result of changing the tests. I think it's because I declared the anatomy tests as slow, they should add some 20 seconds or so. Maybe we can now live with that. The other good news is that we could spawn again different builds for different versions of python etc. contributions welcomed.