diart icon indicating copy to clipboard operation
diart copied to clipboard

Change speed of streaming display to match audio speed

Open AMITKESARI2000 opened this issue 2 years ago • 4 comments

Hi, Currently when I try to stream the predicted labels (as shown in your readme as well). The speed of your stream is almost 2x times faster than that of actual audio. Any method how to sync and play both audio and stream together ?

%matplotlib notebook

out_dir = os.path.join(data_dir, 'pred_rttms')
os.makedirs(out_dir, exist_ok=True)

## initalise the streaming pipeline!
config = PipelineConfig(
    step=0.5,
    latency=0.5,
    tau_active=0.555,
    rho_update=0.422,
    delta_new=1.517
)
pipeline = OnlineSpeakerDiarization(config)

audio_source = FileAudioSource(audio_file, config.sample_rate, config.duration, config.step)

inference = RealTimeInference(out_dir, do_plot=True)

print("Starting Stream...Click Audio.")
inference(pipeline, audio_source)

image


Also I tried to increase step=1.0, latency=1.0 for trying out sync, but it starts giving me FileNotFoundError as below (note: it runs perfectly fine for step=0.5, latency=0.5 ) :

/root/anaconda3/lib/python3.9/site-packages/pyannote/audio/models/blocks/pooling.py:72: UserWarning: Mismatch between frames (279) and weights (293) numbers.
  warnings.warn(
Streaming mbzht:   2%|▏         | 2/127 [00:00<00:22,  5.61it/s]Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.9/site-packages/rx/core/operators/map.py", line 37, in on_next
    result = _mapper(value)
  File "/root/anaconda3/lib/python3.9/site-packages/rx/core/operators/scan.py", line 36, in projection
    accumulation = accumulator(accumulation, x)
  File "/root/anaconda3/lib/python3.9/site-packages/diart/operators.py", line 269, in accumulate
    waveform[state.next_sample:new_next_sample] = value.waveform.data
ValueError: could not broadcast input array from shape (16001,1) into shape (16000,1)
Streaming mbzht:   2%|▏         | 2/127 [00:00<00:22,  5.47it/s]
Streaming mbzht:   0%|          | 0/127 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.9/site-packages/rx/core/operators/do.py", line 66, in _on_completed
    on_completed()
  File "/root/anaconda3/lib/python3.9/site-packages/diart/sinks.py", line 44, in on_completed
    self.patch_rttm()
  File "/root/anaconda3/lib/python3.9/site-packages/diart/sinks.py", line 27, in patch_rttm
    annotation = list(load_rttm(self.path).values())[0]
  File "/root/anaconda3/lib/python3.9/site-packages/pyannote/database/util.py", line 306, in load_rttm
    data = pd.read_csv(
  File "/root/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/root/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/root/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/root/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/root/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1217, in _make_engine
    self.handles = get_handle(  # type: ignore[call-overload]
  File "/root/anaconda3/lib/python3.9/site-packages/pandas/io/common.py", line 789, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/home/DATA/amit_kesari/SD1/diart/sample_audio_rttm/pred_rttms/mbzht.rttm'
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Input In [8], in <cell line: 26>()
     23 inference = RealTimeInference(out_dir, do_plot=True)
     25 print("Starting Stream...Click Audio.")
---> 26 get_ipython().run_line_magic('timeit', 'inference(pipeline, audio_source)')

File ~/anaconda3/lib/python3.9/site-packages/IPython/core/interactiveshell.py:2294, in InteractiveShell.run_line_magic(self, magic_name, line, _stack_depth)
   2292     kwargs['local_ns'] = self.get_local_scope(stack_depth)
   2293 with self.builtin_trap:
-> 2294     result = fn(*args, **kwargs)
   2295 return result

File ~/anaconda3/lib/python3.9/site-packages/IPython/core/magics/execution.py:1166, in ExecutionMagics.timeit(self, line, cell, local_ns)
   1163         if time_number >= 0.2:
   1164             break
-> 1166 all_runs = timer.repeat(repeat, number)
   1167 best = min(all_runs) / number
   1168 worst = max(all_runs) / number

File ~/anaconda3/lib/python3.9/timeit.py:205, in Timer.repeat(self, repeat, number)
    203 r = []
    204 for i in range(repeat):
--> 205     t = self.timeit(number)
    206     r.append(t)
    207 return r

File ~/anaconda3/lib/python3.9/site-packages/IPython/core/magics/execution.py:156, in Timer.timeit(self, number)
    154 gc.disable()
    155 try:
--> 156     timing = self.inner(it, self.timer)
    157 finally:
    158     if gcold:

File <magic-timeit>:1, in inner(_it, _timer)

File ~/anaconda3/lib/python3.9/site-packages/diart/inference.py:71, in RealTimeInference.__call__(self, pipeline, source)
     68 # Stream audio through the pipeline
     69 source.read()
---> 71 return load_rttm(rttm_path)[source.uri]

File ~/anaconda3/lib/python3.9/site-packages/pyannote/database/util.py:306, in load_rttm(file_rttm)
    293 names = [
    294     "NA1",
    295     "uri",
   (...)
    303     "NA6",
    304 ]
    305 dtype = {"uri": str, "start": float, "duration": float, "speaker": str}
--> 306 data = pd.read_csv(
    307     file_rttm,
    308     names=names,
    309     dtype=dtype,
    310     delim_whitespace=True,
    311     keep_default_na=False,
    312 )
    314 annotations = dict()
    315 for uri, turns in data.groupby("uri"):

File ~/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    665 kwds_defaults = _refine_defaults_read(
    666     dialect,
    667     delimiter,
   (...)
    676     defaults={"delimiter": ","},
    677 )
    678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:575, in _read(filepath_or_buffer, kwds)
    572 _validate_names(kwds.get("names", None))
    574 # Create the parser.
--> 575 parser = TextFileReader(filepath_or_buffer, **kwds)
    577 if chunksize or iterator:
    578     return parser

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:933, in TextFileReader.__init__(self, f, engine, **kwds)
    930     self.options["has_index_names"] = kwds["has_index_names"]
    932 self.handles: IOHandles | None = None
--> 933 self._engine = self._make_engine(f, self.engine)

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1217, in TextFileReader._make_engine(self, f, engine)
   1213     mode = "rb"
   1214 # error: No overload variant of "get_handle" matches argument types
   1215 # "Union[str, PathLike[str], ReadCsvBuffer[bytes], ReadCsvBuffer[str]]"
   1216 # , "str", "bool", "Any", "Any", "Any", "Any", "Any"
-> 1217 self.handles = get_handle(  # type: ignore[call-overload]
   1218     f,
   1219     mode,
   1220     encoding=self.options.get("encoding", None),
   1221     compression=self.options.get("compression", None),
   1222     memory_map=self.options.get("memory_map", False),
   1223     is_text=is_text,
   1224     errors=self.options.get("encoding_errors", "strict"),
   1225     storage_options=self.options.get("storage_options", None),
   1226 )
   1227 assert self.handles is not None
   1228 f = self.handles.handle

File ~/anaconda3/lib/python3.9/site-packages/pandas/io/common.py:789, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    784 elif isinstance(handle, str):
    785     # Check whether the filename is to be opened in binary mode.
    786     # Binary mode does not support 'encoding' and 'newline'.
    787     if ioargs.encoding and "b" not in ioargs.mode:
    788         # Encoding
--> 789         handle = open(
    790             handle,
    791             ioargs.mode,
    792             encoding=ioargs.encoding,
    793             errors=errors,
    794             newline="",
    795         )
    796     else:
    797         # Binary mode
    798         handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: '/home/DATA/amit_kesari/SD1/diart/sample_audio_rttm/pred_rttms/mbzht.rttm'

Thanks

AMITKESARI2000 avatar Jul 27 '22 19:07 AMITKESARI2000

This is actually a great question. I've been thinking about a way to sync the plot and the audio since the first release. One easy possibility is to time.sleep(step_duration) between two chunks, but I'm not sure how to avoid inaccuracies that would snowball over time, although for short audio files it should be ok.

The main reason why I wanted this feature is to play the audio while it's being streamed, but I haven't been able to find a clean way to do this where the plot and audio are in sync. Any thoughts?

On the other hand, the error you posted seems to be here in buffer_output:

File "/root/anaconda3/lib/python3.9/site-packages/diart/operators.py", line 269, in accumulate
    waveform[state.next_sample:new_next_sample] = value.waveform.data
ValueError: could not broadcast input array from shape (16001,1) into shape (16000,1)

There must be a rounding error or a corner case with the latency and step values you chose. I'll dig deeper into this as soon as I have some free time. The good news is that this buffering is only needed for plotting (to be efficient with RAM usage). A quick fix that should make it work is to replace buffer_output by accumulate_output in RealTimeInference.__call__ (line 61).

juanmc2005 avatar Jul 27 '22 20:07 juanmc2005

For streaming them together, Holoviews and Panel by the Anaconda can be used (example blog) ( stackoverflow ) and tried.

AMITKESARI2000 avatar Jul 28 '22 07:07 AMITKESARI2000

Interesting! Have you played a bit with it and diart? I would love to see something like this in the project!

juanmc2005 avatar Jul 28 '22 12:07 juanmc2005

We could draw the diarization part with Segments and update it with streams

juanmc2005 avatar Jul 28 '22 13:07 juanmc2005