voxseg
                                
                                 voxseg copied to clipboard
                                
                                    voxseg copied to clipboard
                            
                            
                            
                        ValueError raised when audio file has no voice activity
First of all, thank you for all of your work. This package is proving to be very helpful.
I have come across what appears to be a bug. If I supply an audio file to Voxseg and no voice activity is identified, this ValueError is thrown:
------------------- Running VAD -------------------
2021-06-08 18:15:50.236547: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-06-08 18:15:50.236961: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2299965000 Hz
Traceback (most recent call last):
  File "voxseg/main.py", line 58, in <module>
    endpoints = run_cnnlstm.decode(targets, speech_thresh, speech_w_music_thresh, filt)
  File "../voxseg/env/lib/python3.8/site-packages/voxseg/run_cnnlstm.py", line 57, in decode
    ((targets['start'] * 100).astype(int)).astype(str).str.zfill(7) + '_' + \
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/generic.py", line 5874, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 631, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 427, in apply
    applied = getattr(b, f)(**kwargs)
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 673, in astype
    values = astype_nansafe(vals1d, dtype, copy=True)
  File "../voxseg/env/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1074, in astype_nansafe
    return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
  File "pandas/_libs/lib.pyx", line 619, in pandas._libs.lib.astype_intsafe
ValueError: cannot convert float NaN to integer
I suspect that because no voice activity has been identified, no time points exist or they are NaN values (i.e. targets['start'] and targets['end']), causing the following code to fail:
From voxseg.run_cnnlstm.decode
    targets['utterance-id'] = targets['recording-id'].astype(str) + '_' + \
                        ((targets['start'] * 100).astype(int)).astype(str).str.zfill(7) + '_' + \
                        ((targets['end'] * 100).astype(int)).astype(str).str.zfill(7)
I have put together a workaround but figured others will likely come across this bug at some point. I also would like to know if this bug is due to some other cause than the lack of voice activity.
Many thanks!
Hi @a-n-rose
I'm glad to hear you've been finding the package useful.
Thank you for pointing this out, I suspect you are correct that it is caused by no voice activity being identified. That would cause NaN values or an empty DataFrame like you said. I will do some testing to double check that there isn't some other underlying issue causing the bug and I will apply a fix.
Thanks again for bringing this to my attention!
Hello @NickWilkinson37
I have attached wav file which causes ValueError: cannot convert float NaN to integer
Best regards, Anton
Any updates? I did the following workaround in the decode function on the run_cnnlstm file to avoid this error:
targets = targets.drop(['predicted-targets'], axis=1)
targets.fillna(value=0, inplace=True)
targets = targets.apply(pd.Series.explode).reset_index(drop=True)
targets['utterance-id'] = targets['recording-id'].astype(str) + '_' + \
                        ((targets['start'] * 100).astype(int)).astype(str).str.zfill(7) + '_' + \
                        ((targets['end'] * 100).astype(int)).astype(str).str.zfill(7)