Controlling a sound delivery onset through the sd.OutputStream object for human research
Hello @mgeier! First of all, thank you for the continuous development and maintenance of sounddevice, it's amazing :smiley:
Short version
How could I further improve the callback, argument settings of OutputStream, ... to precisely start delivering an auditory stimuli at a given time for a sounddevice backend used to deliver auditory stimuli in human neuroscience research? Code at the end.
Long version
You might know of Psychtoolbox, PsychoPy, tools to design psychological, behavioral, neurosciences experiments. I work primarily with the later, where delivering an auditory stimuli is very often used to elicit a neurological or behavioral response in a participant. Historically, Psychtoolbox offers an excellent interface to portaudio with which I can deliver an audio stimuli simultaneously with a trigger (an event to mark the audio onset, usually delivered via parallel port) with excellent precision (delay between the event and the audio onset measured at < 1 ms).
In python, this interface is accessible through PsychoPy and its SoundPTB object. PsychoPy also offers other backends, including SoundDeviceSound. Sadly, the other backends do not match the performance of SoundPTB (delay between event marking the sound onset and the actual sound onset non zero and variable). On top of that, PsychoPy development tends to break regularly the most basic features, including sound delivery with their latest 2024.2.0 and 2024.2.1 versions.
Bringing me to today's affairs, I gave a shot in implementing a python audio delivery backend matching SoundPTB performance using sounddevice. The objective is to package this backend independently from PsychoPy to avoid breakage and replace the existing PsychoPy SoundDeviceSound object with a wrapper around this new backend.
The key element which makes the delivery precise in SoundPTB is a scheduling mechanism. Typically, this is how a
sound and a trigger (event) would be delivered:
import psychtoolbox as ptb
from byte_triggers import ParallelPortTrigger
from psychopy.sound.backend_ptb import SoundPTB
# create a 440 Hz sinusoid lasting 200 ms
sound = SoundPTB(value=440.0, secs=0.2)
# create the parallel port trigger object
trigger = ParallelPortTrigger("arduino")
# schedule the sound in 200 ms, wait, deliver the trigger
now = ptb.GetSecs() # psychtoolbox uses a separate clock
sound.play(when=now + 0.2)
sleep(0.2) # usually not the default 'time.sleep' function which is too imprecise
trigger.signal(1) # mark the onset
I tried replicating this scheduling idea with sounddevice, compensating for delays with the time.outputBufferDacTime and time.currentTime value obtained in the callback function, and I got very close. This callback approach compared to a blocking stream.write brings down the delay between the event and the audio onset to ~3 ms ± 1 ms. It is visibly slightly worst than the SoundPTB object, especially with an higher variability.
I'm a bit out of idea on how I could continue to improve this tentative and would love to hear your input on how I could improve the MWE below, how I could set the different OutputStream arguments. Without further due, here is the current working state yielding ~3 ms ± 1 ms.
import time
import numpy as np
import sounddevice as sd
from byte_triggers import ParallelPortTrigger
class Clock:
def __init__(self) -> None:
self._t0 = time.monotonic_ns()
def get_time_ns(self) -> int:
return time.monotonic_ns() - self._t0
class SoundSD:
def __init__(self, data: np.ndarray, device: int, sample_rate: int) -> None:
# store data, device and callback variables
self._data = data if data.ndim == 2 else data[:, np.newaxis]
self._clock = Clock()
self._current_frame = 0
self._target_time = None
# create and open the output stream
self._stream = sd.OutputStream(
blocksize=0, # 0, 4, 64, 128 -> no visible differences
callback=self._callback,
channels=data.shape[1] if data.ndim == 2 else 1,
device=device,
dtype=data.dtype,
latency="low",
samplerate=sample_rate,
)
self._stream.start()
def _callback(self, outdata, frames, time_info, status) -> None:
"""Callback audio function.""" # noqa: D401
if self._target_time is None:
outdata.fill(0)
return
delta_ns = int((time_info.outputBufferDacTime - time_info.currentTime) * 1e9)
if self._clock.get_time_ns() + delta_ns < self._target_time:
outdata.fill(0)
return
end = self._current_frame + frames
if end <= self._data.shape[0]:
outdata[:frames, :] = self._data[self._current_frame : end, :]
self._current_frame += frames
else:
data = self._data[self._current_frame :, :]
data = np.vstack(
(
data,
np.zeros((frames - data.shape[0], data.shape[1]), dtype=data.dtype),
)
)
outdata[:frames, :] = data
# reset
self._current_frame = 0
self._target_time = None
def play(self, when: float | None = None) -> None:
"""Play the audio data.
Parameters
----------
when : float | None
The relative time in seconds when to start playing the audio data. For
instance, ``0.2`` wil start playing in 200 ms. If ``None``, the audio data
is played as soon as possible.
"""
self._target_time = (
self._clock.get_time_ns()
if when is None
else self._clock.get_time_ns() + int(when * 1e9)
)
if __name__ == "__main__":
trigger = ParallelPortTrigger("arduino")
device = sd.query_devices(sd.default.device["output"])
sample_rate = int(device["default_samplerate"])
duration = 0.2
frequency = 440
times = np.linspace(0, duration, int(duration * sample_rate), endpoint=False)
data = np.sin(2 * np.pi * frequency * times)
data /= np.max(np.abs(data)) # normalize
data = data.astype(np.float32) * 0.1
sound = SoundSD(data, device["index"], sample_rate)
sound.play(when=0.5)
time.sleep(0.5) # to be replace with an higher precision sleep function
trigger.signal(1)
time.sleep(0.3)
Note, I measure the delay between the sound onset and the trigger through an EEG amplifier at 1 kHz. I can increase the sampling rate to 16 kHz if needed.
And for completion, a version with RawOutputStream: https://github.com/mscheltienne/stimuli/blob/main/stimuli/audio/_backend/sounddevice.py
It's the same logic, and still is visually slightly worse than SoundPTB.
Looks like I should look into https://github.com/spatialaudio/python-rtmixer
You might know of
Psychtoolbox,PsychoPy, tools to design psychological, behavioral, neurosciences experiments.
Yes, I know about them but I'm not an active user.
https://github.com/spatialaudio/python-sounddevice/issues/37 https://github.com/spatialaudio/python-sounddevice/issues/35#issuecomment-263265640 https://github.com/spatialaudio/python-sounddevice/issues/100#issuecomment-324363626
https://github.com/psychopy/psychopy/pull/1312 https://github.com/psychopy/psychopy/pull/1327
https://github.com/psychopy/psychopy/issues/2065 https://github.com/psychopy/psychopy/pull/2529 https://github.com/psychopy/psychopy/issues/4673
The sounddevice backend was added to PsychoPy around 2016, but apparently the latency wasn't good enough ("massive sound delay and jitter") and they added the PTB backend in 2019 which was much better.
I just saw that there was even a "mega-study": https://peerj.com/articles/9414/
compensating for delays with the
time.outputBufferDacTimeandtime.currentTimevalue obtained in the callback function
I have the feeling that those values can be quite inaccurate, maybe depending on host API, drivers and physical hardware.
See also https://stackoverflow.com/questions/78869667/sounddevice-in-python-time-vector-of-input-and-output-signal-synchronized and https://github.com/spatialaudio/python-sounddevice/issues/148#issuecomment-699518237.
This callback approach compared to a blocking
stream.writebrings down the delay
Yes, the blocking API is definitely bad for jitter, since it quantizes the start times to block boundaries (AFAIK).
brings down the delay between the event and the audio onset to ~3 ms ± 1 ms. It is visibly slightly worst than the
SoundPTBobject, especially with an higher variability.
The variability might be due to the inaccurate time.outputBufferDacTime, but also due to the Python interpreter itself.
And for completion, a version with
RawOutputStream
I'm not sure how much overhead is caused by the NumPy conversions, but if you don't need NumPy arrays, it's probably better to use the "raw" streams.
Looks like I should look into https://github.com/spatialaudio/python-rtmixer
Yeah, that's what I was suggesting in my conversation with the PsychoPy devs back in the day, e.g. https://github.com/psychopy/psychopy/issues/2065#issuecomment-433345942.
Note that it is still quite experimental (many TODOs: https://github.com/search?q=repo%3Aspatialaudio%2Fpython-rtmixer+TODO%3A&type=code) and there wasn't much development happening in recent times, but if it seems promising to you, I'm willing to put some more work into it.
BTW, the original idea of the rtmixer module was to provide an example how to use a callback function implemented in C which can communicate with Python code. The intention was not to provide a fully-featured library itself (but it might still evolve into something like this).
What I want to say with this, is that maybe you should consider writing your own callback function in C (maybe using the rtmixer code as inspiration) and further C code that can interface with your parallel port trigger device. The goal would be to have no Python code between the trigger and whatever happens in the audio callback.