voxseg icon indicating copy to clipboard operation
voxseg copied to clipboard

ValueError raised when audio file has no voice activity

Open a-n-rose opened this issue 4 years ago • 3 comments

First of all, thank you for all of your work. This package is proving to be very helpful.

I have come across what appears to be a bug. If I supply an audio file to Voxseg and no voice activity is identified, this ValueError is thrown:

------------------- Running VAD -------------------
2021-06-08 18:15:50.236547: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-06-08 18:15:50.236961: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2299965000 Hz
Traceback (most recent call last):
  File "voxseg/main.py", line 58, in <module>
    endpoints = run_cnnlstm.decode(targets, speech_thresh, speech_w_music_thresh, filt)
  File "../voxseg/env/lib/python3.8/site-packages/voxseg/run_cnnlstm.py", line 57, in decode
    ((targets['start'] * 100).astype(int)).astype(str).str.zfill(7) + '_' + \
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/generic.py", line 5874, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 631, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 427, in apply
    applied = getattr(b, f)(**kwargs)
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 673, in astype
    values = astype_nansafe(vals1d, dtype, copy=True)
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1074, in astype_nansafe
    return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
  File "pandas/_libs/lib.pyx", line 619, in pandas._libs.lib.astype_intsafe
ValueError: cannot convert float NaN to integer

I suspect that because no voice activity has been identified, no time points exist or they are NaN values (i.e. targets['start'] and targets['end']), causing the following code to fail:

From voxseg.run_cnnlstm.decode

    targets['utterance-id'] = targets['recording-id'].astype(str) + '_' + \
                        ((targets['start'] * 100).astype(int)).astype(str).str.zfill(7) + '_' + \
                        ((targets['end'] * 100).astype(int)).astype(str).str.zfill(7)

I have put together a workaround but figured others will likely come across this bug at some point. I also would like to know if this bug is due to some other cause than the lack of voice activity.

Many thanks!

a-n-rose avatar Jun 08 '21 21:06 a-n-rose

Hi @a-n-rose

I'm glad to hear you've been finding the package useful.

Thank you for pointing this out, I suspect you are correct that it is caused by no voice activity being identified. That would cause NaN values or an empty DataFrame like you said. I will do some testing to double check that there isn't some other underlying issue causing the bug and I will apply a fix.

Thanks again for bringing this to my attention!

NickWilkinson37 avatar Jun 09 '21 09:06 NickWilkinson37

Hello @NickWilkinson37

I have attached wav file which causes ValueError: cannot convert float NaN to integer

Best regards, Anton

24_10_2021-150256.wav.zip

antonakv avatar Oct 24 '21 13:10 antonakv

Any updates? I did the following workaround in the decode function on the run_cnnlstm file to avoid this error:

targets = targets.drop(['predicted-targets'], axis=1)
targets.fillna(value=0, inplace=True)
targets = targets.apply(pd.Series.explode).reset_index(drop=True)
targets['utterance-id'] = targets['recording-id'].astype(str) + '_' + \
                        ((targets['start'] * 100).astype(int)).astype(str).str.zfill(7) + '_' + \
                        ((targets['end'] * 100).astype(int)).astype(str).str.zfill(7)

rafaelgreca avatar Jul 23 '22 14:07 rafaelgreca