reading issues (sfreq) introduced in recent updates?
hey @cbrnr and @jamieforth I am getting this warning, and it's also for example in thie mne example: https://mne.tools/stable/auto_examples/io/read_xdf.html
I don't really know what to make of it. What does it mean? What should I do as a user? Does this point to flaws in my recording setup? Does this mean the data is not usable in some way?
Originally posted by @sappelhoff in https://github.com/xdf-modules/pyxdf/pull/131#discussion_r2465228340
I can confirm that files that were "no problem" before upgrading to pyxdf v1.17.1 are starting to be problematic. Most importantly, the effective sfreq calculation is off.
Perhaps I can get a chance to record a minimal example later to be shared here. In any case, I suspect some issues were introduced and I am still figuring out, where.
Actually, perhaps the XDF file used in this mne example can already serve as an example. Note the pxdf warning about effective sfreq only in the new pyxdf example
See the docs built using the old pyxdf: https://mne.tools/1.8/auto_examples/io/read_xdf.html
Compared to the docs built using the new pyxdf: https://mne.tools/dev/auto_examples/io/read_xdf.html
Thanks @sappelhoff, I didn't see that you had already opened an issue when I replied to your comment in #131. So let's continue discussing this regression here (and I agree it is a regression which we should try to fix ASAP).
We also have an open issue regarding the meaning of the warning "Segments and clock-segments differ" in #137, which we should also address independently (i.e., make it clearer). But first we should fix the regression.
As a side note, displaying this file is super super slow on my Mac. I tried both backends (Matplotlib and Qt), they are both equally sluggish. Maybe the sampling frequency is too high, causing too many data points to be drawn simultaneously? I guess this is an issue with MNE so I'll report it there.
Edit: Only MNE-Qt-Browser is slow, everything is fine with the Matplotlib backend.
@cbrnr can we yank the pypi release of 0.17.1? As this is a critical bug that will mess up people's analyses. It makes for a bad reputation if people spend a week debugging their data, when the real problem is pyxdf 😆
@sappelhoff yes will do!
Update: Done!
Merging #140 would restore the previous behaviour.
fname = misc.data_path() / "xdf" / "sub-P001_ses-S004_task-Default_run-001_eeg_a2.xdf"
streams, header = pyxdf.load_xdf(fname, dejitter_timestamps=True)
print(f"nominal_srate: {float(streams[0]["info"]["nominal_srate"][0])} Hz")
print(f"effective_srate: {float(streams[0]["info"]["effective_srate"]):.2f} Hz")
print(f"segment count: {len(streams[0]["info"]["segments"])}")
print(f"clock-segment count: {len(streams[0]["info"]["clock_segments"])}")
output with commit e5e50a2a (#140) nominal_srate: 10000.0 Hz effective_srate: 10000.45 Hz segment count: 1 clock-segment count: 1
output with commit 1ef0ad9f (v1.17.1) Stream 1: Calculated effective sampling rate 4765.1440 Hz is different from specified rate 10000.0000 Hz. Stream 1: Segments and clock-segments differ nominal_srate: 10000.0 Hz effective_srate: 4765.14 Hz segment count: 217890 clock-segment count: 1
The issue boils down to how we handle/segment out-of-order timestamps. In this dataset about 9% of ground-truth time-stamp intervals are negative. @sappelhoff is this "normal" in your experience?
streams, header = pyxdf.load_xdf(fname, dejitter_timestamps=False)
time_stamps = streams[0]["time_stamps"].T
dec = (np.diff(time_stamps) <= 0).sum()
n = 100
plt.plot(time_stamps[0:n] - time_stamps[0])
plt.title(f"Non-monotonic time-stamps: {dec} ({dec / len(time_stamps):.1%})")
plt.xlabel("sample")
plt.ylabel("time (s)")
plt.vlines(
np.where(np.diff(time_stamps[0:n]) < 0),
1,
0.0,
transform=plt.gca().get_xaxis_transform(),
alpha=0.5,
colors=plt.cm.tab20.colors[3],
label="segment",
)
plt.legend()
The critical bit of code is the dejitter segmentation logic.
In v1.17.0 negative time-intervals were never segmented.
b_breaks = diffs > np.max(
(threshold_seconds, threshold_samples * stream.tdiff)
)
In #127 we tried abs(diffs) to segment positive and negative intervals above a threshold, but it can lead to some weird edge cases so we dropped it.
b_breaks = np.abs(diffs) > np.max(
(threshold_seconds, threshold_samples * stream.tdiff)
)
In v1.17.1 we always segment at negative intervals. This is theoretically nice but it is a breaking change for datasets with out-of-order time-stamps.
b_breaks = (diffs < 0) | (
diffs > np.max((threshold_seconds, threshold_samples * stream.tdiff))
)
#140 goes back to the abs(diff) compromise, so is probably the way to go.
Thanks for the nice writeup! When you say that "we tried abs(diffs) to segment positive and negative intervals above a threshold, but it can lead to some weird edge cases", does that mean that #140 does not have those weird edge cases?
Also, it looks like out-of-order timestamps could be more common than we think, so we should be very careful when introducing new behavior and/or info/warning messages. In #131 you suggest the following clarification:
if
len(segments) > len(clock_segments): "Segment (n=X) and clock segments (n=Y) differ: this likely means there are gaps in the data, or time-stamps are not strictly ordered. You might want to inspect the ground-truth time-stamps (dejitter_timestamps=False) and choose appropriate dejitter thresholds to optimise the segmentation."
if
len(segments) < len(clock_segments): "Segment (n=X) and clock segments (n=Y) differ: this means clock resets have been detected but there are no corresponding data segments. You might want to inspect the clock offsets and time-stamp data if you were not expecting this."
I think this is better, but people will probably not now what segments and clock segments are, so maybe we should add a short introductory explainer? Also, I think it should be "Number of segments and clock segments differ...", right? But how are segments and clock segmentes defined/calculated?
@jamieforth XDF diagnostics like this could be very useful in MNELAB, which already includes decent XDF features like header/footer inspection and chunk diagnostics. If you are interested, you could contribute a timestamps inspection feature that e.g. shows the plot you created, includes helpful hints, etc.