diart
diart copied to clipboard
Change speed of streaming display to match audio speed
Hi, Currently when I try to stream the predicted labels (as shown in your readme as well). The speed of your stream is almost 2x times faster than that of actual audio. Any method how to sync and play both audio and stream together ?
%matplotlib notebook
out_dir = os.path.join(data_dir, 'pred_rttms')
os.makedirs(out_dir, exist_ok=True)
## initalise the streaming pipeline!
config = PipelineConfig(
step=0.5,
latency=0.5,
tau_active=0.555,
rho_update=0.422,
delta_new=1.517
)
pipeline = OnlineSpeakerDiarization(config)
audio_source = FileAudioSource(audio_file, config.sample_rate, config.duration, config.step)
inference = RealTimeInference(out_dir, do_plot=True)
print("Starting Stream...Click Audio.")
inference(pipeline, audio_source)
Also I tried to increase step=1.0, latency=1.0
for trying out sync, but it starts giving me FileNotFoundError
as below (note: it runs perfectly fine for step=0.5, latency=0.5
) :
/root/anaconda3/lib/python3.9/site-packages/pyannote/audio/models/blocks/pooling.py:72: UserWarning: Mismatch between frames (279) and weights (293) numbers.
warnings.warn(
Streaming mbzht: 2%|▏ | 2/127 [00:00<00:22, 5.61it/s]Traceback (most recent call last):
File "/root/anaconda3/lib/python3.9/site-packages/rx/core/operators/map.py", line 37, in on_next
result = _mapper(value)
File "/root/anaconda3/lib/python3.9/site-packages/rx/core/operators/scan.py", line 36, in projection
accumulation = accumulator(accumulation, x)
File "/root/anaconda3/lib/python3.9/site-packages/diart/operators.py", line 269, in accumulate
waveform[state.next_sample:new_next_sample] = value.waveform.data
ValueError: could not broadcast input array from shape (16001,1) into shape (16000,1)
Streaming mbzht: 2%|▏ | 2/127 [00:00<00:22, 5.47it/s]
Streaming mbzht: 0%| | 0/127 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.9/site-packages/rx/core/operators/do.py", line 66, in _on_completed
on_completed()
File "/root/anaconda3/lib/python3.9/site-packages/diart/sinks.py", line 44, in on_completed
self.patch_rttm()
File "/root/anaconda3/lib/python3.9/site-packages/diart/sinks.py", line 27, in patch_rttm
annotation = list(load_rttm(self.path).values())[0]
File "/root/anaconda3/lib/python3.9/site-packages/pyannote/database/util.py", line 306, in load_rttm
data = pd.read_csv(
File "/root/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/root/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "/root/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/root/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in __init__
self._engine = self._make_engine(f, self.engine)
File "/root/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1217, in _make_engine
self.handles = get_handle( # type: ignore[call-overload]
File "/root/anaconda3/lib/python3.9/site-packages/pandas/io/common.py", line 789, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/home/DATA/amit_kesari/SD1/diart/sample_audio_rttm/pred_rttms/mbzht.rttm'
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Input In [8], in <cell line: 26>()
23 inference = RealTimeInference(out_dir, do_plot=True)
25 print("Starting Stream...Click Audio.")
---> 26 get_ipython().run_line_magic('timeit', 'inference(pipeline, audio_source)')
File ~/anaconda3/lib/python3.9/site-packages/IPython/core/interactiveshell.py:2294, in InteractiveShell.run_line_magic(self, magic_name, line, _stack_depth)
2292 kwargs['local_ns'] = self.get_local_scope(stack_depth)
2293 with self.builtin_trap:
-> 2294 result = fn(*args, **kwargs)
2295 return result
File ~/anaconda3/lib/python3.9/site-packages/IPython/core/magics/execution.py:1166, in ExecutionMagics.timeit(self, line, cell, local_ns)
1163 if time_number >= 0.2:
1164 break
-> 1166 all_runs = timer.repeat(repeat, number)
1167 best = min(all_runs) / number
1168 worst = max(all_runs) / number
File ~/anaconda3/lib/python3.9/timeit.py:205, in Timer.repeat(self, repeat, number)
203 r = []
204 for i in range(repeat):
--> 205 t = self.timeit(number)
206 r.append(t)
207 return r
File ~/anaconda3/lib/python3.9/site-packages/IPython/core/magics/execution.py:156, in Timer.timeit(self, number)
154 gc.disable()
155 try:
--> 156 timing = self.inner(it, self.timer)
157 finally:
158 if gcold:
File <magic-timeit>:1, in inner(_it, _timer)
File ~/anaconda3/lib/python3.9/site-packages/diart/inference.py:71, in RealTimeInference.__call__(self, pipeline, source)
68 # Stream audio through the pipeline
69 source.read()
---> 71 return load_rttm(rttm_path)[source.uri]
File ~/anaconda3/lib/python3.9/site-packages/pyannote/database/util.py:306, in load_rttm(file_rttm)
293 names = [
294 "NA1",
295 "uri",
(...)
303 "NA6",
304 ]
305 dtype = {"uri": str, "start": float, "duration": float, "speaker": str}
--> 306 data = pd.read_csv(
307 file_rttm,
308 names=names,
309 dtype=dtype,
310 delim_whitespace=True,
311 keep_default_na=False,
312 )
314 annotations = dict()
315 for uri, turns in data.groupby("uri"):
File ~/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
305 if len(args) > num_allow_args:
306 warnings.warn(
307 msg.format(arguments=arguments),
308 FutureWarning,
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
665 kwds_defaults = _refine_defaults_read(
666 dialect,
667 delimiter,
(...)
676 defaults={"delimiter": ","},
677 )
678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)
File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:575, in _read(filepath_or_buffer, kwds)
572 _validate_names(kwds.get("names", None))
574 # Create the parser.
--> 575 parser = TextFileReader(filepath_or_buffer, **kwds)
577 if chunksize or iterator:
578 return parser
File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:933, in TextFileReader.__init__(self, f, engine, **kwds)
930 self.options["has_index_names"] = kwds["has_index_names"]
932 self.handles: IOHandles | None = None
--> 933 self._engine = self._make_engine(f, self.engine)
File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1217, in TextFileReader._make_engine(self, f, engine)
1213 mode = "rb"
1214 # error: No overload variant of "get_handle" matches argument types
1215 # "Union[str, PathLike[str], ReadCsvBuffer[bytes], ReadCsvBuffer[str]]"
1216 # , "str", "bool", "Any", "Any", "Any", "Any", "Any"
-> 1217 self.handles = get_handle( # type: ignore[call-overload]
1218 f,
1219 mode,
1220 encoding=self.options.get("encoding", None),
1221 compression=self.options.get("compression", None),
1222 memory_map=self.options.get("memory_map", False),
1223 is_text=is_text,
1224 errors=self.options.get("encoding_errors", "strict"),
1225 storage_options=self.options.get("storage_options", None),
1226 )
1227 assert self.handles is not None
1228 f = self.handles.handle
File ~/anaconda3/lib/python3.9/site-packages/pandas/io/common.py:789, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
784 elif isinstance(handle, str):
785 # Check whether the filename is to be opened in binary mode.
786 # Binary mode does not support 'encoding' and 'newline'.
787 if ioargs.encoding and "b" not in ioargs.mode:
788 # Encoding
--> 789 handle = open(
790 handle,
791 ioargs.mode,
792 encoding=ioargs.encoding,
793 errors=errors,
794 newline="",
795 )
796 else:
797 # Binary mode
798 handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/home/DATA/amit_kesari/SD1/diart/sample_audio_rttm/pred_rttms/mbzht.rttm'
Thanks
This is actually a great question. I've been thinking about a way to sync the plot and the audio since the first release.
One easy possibility is to time.sleep(step_duration)
between two chunks, but I'm not sure how to avoid inaccuracies that would snowball over time, although for short audio files it should be ok.
The main reason why I wanted this feature is to play the audio while it's being streamed, but I haven't been able to find a clean way to do this where the plot and audio are in sync. Any thoughts?
On the other hand, the error you posted seems to be here in buffer_output
:
File "/root/anaconda3/lib/python3.9/site-packages/diart/operators.py", line 269, in accumulate
waveform[state.next_sample:new_next_sample] = value.waveform.data
ValueError: could not broadcast input array from shape (16001,1) into shape (16000,1)
There must be a rounding error or a corner case with the latency and step values you chose. I'll dig deeper into this as soon as I have some free time.
The good news is that this buffering is only needed for plotting (to be efficient with RAM usage). A quick fix that should make it work is to replace buffer_output
by accumulate_output
in RealTimeInference.__call__
(line 61).
For streaming them together, Holoviews and Panel by the Anaconda can be used (example blog) ( stackoverflow ) and tried.
Interesting! Have you played a bit with it and diart? I would love to see something like this in the project!