visbrain
visbrain copied to clipboard
Survey: best hypnogram format
Hi everyone,
This is not really an issue but rather a survey on what people think would be the best format to load and save sleep staging files (hypnogram). For a reminder, there are currently two main categories of formats supported by Visbrain, the stage-duration and point-per-second:
The point-per-second can be further subdivided into 1) the .hyp format (screenshot above) or 2) a .txt file with no header that MUST be accompanied by a hypno_description.txt file indicating the correspondence between the integer values and the sleep stage as well as the sampling frequency of the hypnogram.
I am not entirely satisfied with neither of these options. Some issues that I have are:
-
The stage-duration format is not super practical because it needs to be converted back to a point-per-second vector in order to apply masking operations (e.g. detecting spindles only on N2 sleep), which I think is hard to do for users that have no or little programming experience.
-
For the point-per-second, I think that 1) having a separate extension (.hyp) is not great because beginners may not realize that this is in fact simply a text file. I therefore much prefer a .csv or .txt extension; 2) however I don't like either the current text (.txt) format because it requires a hypno_description.txt file, i.e. 2 files instead of 1, which is cumbersome and may lead to error.
I think that one of my preferred format would be a single text file (.txt or .csv) extension that looks like:
# Date: Thu Mar 26 15:17:00 2020
# Number of values: 15
# Sampling frequency: 0.03333333333333333 Hz
# Resolution: 30.0 sec
# Duration (seconds): 450.0
# Stage Uns: -2
# Stage Art: -1
# Stage W: 0
# Stage N1: 1
# Stage N2: 2
# Stage N3: 3
# Stage REM: 4
0
1
2
-1
A loading function could then separately read the header and the values to construct a final point-per-sec hypnogram.
An alternative that I've seen in some sleep scoring softwares could be a .csv files with several columns to indicate 1) the sleep stage (in string format, e.g. "N1", "N2), 2) the epoch number and 3) the time at the start of the epoch (e.g. 22:10:30), respectively. However, this requires having the start time of the recording, which is not always known, especially when working with a NumPy array.
Epoch | Time | Stage |
---|---|---|
0 | 22:00:00 | W |
1 | 22:00:30 | W |
2 | 22:01:00 | N1 |
3 | 22:01:30 | N2 |
What do people think? To be clear, I am not saying we should completely change the hypnogram format in Visbrain, but just trying to think of what could be the most convenient format for most users.
Thanks! Raphael
On the one side, format 1 has many advantages and is effective, but cumbersome (had to deal with this myself at times). On the other side, most programs work with a format similar to 2, and sleep researchers are quite familiar with it. As we are still mostly scoring in equal-spaced epochs (ie always 30 seconds), time based annotations do not add much yet.
I'd vote for a format 2 with #
annotations as .txt
. Dropping the description file will help to unclutter the folders. The multi-column would just increase human-readability, as the epoch and time information is redundant with the information in the #
-header, so I don't think it's necessary.
Maybe there could be a header part about scoring procedure (created by visbrain v1.x at %datetime%
), in future this could be used to add information about automatic scoring algos.
I (and @grahamfindlay ) have a strong preference for the stage-duration format
-
Using fixed-size epochs and/or using 1s as the atomic unit of sleep is arbitrary and wholly unfit for a lot of types of analysis (that's the main point really)
-
No need to extract (or save) label/value mapping from metadata. No ambiguity about label/values which may cause non-failing unintentional substitutions eg when sharing bits of code written with different mappings.
-
No loss of accuracy due to downsampling/resampling.
-
I agree it may make some operations (such as direct masking) a bit less straightforward in python, but facilitates others (such as recovering start and end time for a specific event/epoch). At the end of the day I suppose it's a choice about manipulating npy arrays vs pandas dataframes. The latter are a bit trickier but not the end of the world either, and Sleep already provides functions for converting from one format to the other, so I'd say the other points prevail from our perspective.