silero-vad icon indicating copy to clipboard operation
silero-vad copied to clipboard

[C++] Questions Why python and c++ time stamps are different?

Open NathanJHLee opened this issue 5 months ago • 20 comments

❓ Questions and Help

Hi silero team! When i try to use silero-vad using python, I felt it is good. But if i use silero-vad using c++, i got quite different result between python and c++.

I prepared silero-vad 5.1(pip) and c++ build( silero-vad-master downloaded on 2024-08-26) respectively.

#Test samle file. Voxconverse data [asr1@k-atc12 cpp]$ sox --i voxconverse_data/dev/audio/afjiv.wav

Input File : 'voxconverse_data/dev/audio/afjiv.wav' Channels : 1 Sample Rate : 16000 Precision : 16-bit Duration : 00:02:31.25 = 2419968 samples ~ 11343.6 CDDA sectors File Size : 4.84M Bit Rate : 256k Sample Encoding: 16-bit Signed Integer PCM

sha256sum ~/miniconda3/envs/wespeaker/lib/python3.9/site-packages/silero_vad/data/silero_vad.onnx 2623a2953f6ff3d2c1e61740c6cdb7168133479b267dfef114a4a3cc5bdd788f miniconda3/envs/wespeaker/lib/python3.9/site-packages/silero_vad/data/silero_vad.onnx

#in Python.

from silero_vad import load_silero_vad, read_audio, get_speech_timestamps model = load_silero_vad(True) #Changed it using ONNX model "True" wav = read_audio('/ws/stt/DB/SD/wespeaker/voxconverse_data/dev/audio/afjiv.wav') speech_timestamps = get_speech_timestamps(wav, model) for timestamp in speech_timestamps: ... print(timestamp) ... {'start': 84512, 'end': 474592} {'start': 476192, 'end': 506848} {'start': 509984, 'end': 548320} {'start': 554528, 'end': 686048} {'start': 688672, 'end': 787936} {'start': 789536, 'end': 826848} {'start': 829472, 'end': 847328} {'start': 848928, 'end': 859616} {'start': 862240, 'end': 1046496} {'start': 1048096, 'end': 1068000} {'start': 1071136, 'end': 1341408} {'start': 1357344, 'end': 1379296} {'start': 1392160, 'end': 1408992} {'start': 1418784, 'end': 1427936} {'start': 1431584, 'end': 1485280} {'start': 1488928, 'end': 1511904} {'start': 1520672, 'end': 1569248} {'start': 1578016, 'end': 1610208} {'start': 1617440, 'end': 1651168} {'start': 1653280, 'end': 1675744} {'start': 1686048, 'end': 1710048} {'start': 1715232, 'end': 1726432} {'start': 1730080, 'end': 1751008} {'start': 1753120, 'end': 1773536} {'start': 1776160, 'end': 1791968} {'start': 1795104, 'end': 1813984} {'start': 1820192, 'end': 1860576} {'start': 1869344, 'end': 1907680} {'start': 1909280, 'end': 1959392} {'start': 1966624, 'end': 1989088} {'start': 2002976, 'end': 2050016} {'start': 2055712, 'end': 2077152} {'start': 2093600, 'end': 2132448} {'start': 2138656, 'end': 2147808} {'start': 2169888, 'end': 2211296} {'start': 2222112, 'end': 2244064} {'start': 2249760, 'end': 2267616} {'start': 2271264, 'end': 2302944} {'start': 2313760, 'end': 2327520}

#in c++ (Built by silrero-vad souce. I downloaded 'silero-vad-master' on 2024-08-26) changed some parameter in 'silero-vad-master/examples/cpp/silero-vad-onnx.cpp' float Threshold = 0.5, int min_silence_duration_ms = 100, int speech_pad_ms = 30, int min_speech_duration_ms = 250,
#They are referred from '~/miniconda3/envs/wespeaker/lib/python3.9/site-packages/silero_vad/utils_vad.py'

sha256sum "../../src/silero_vad/data/silero_vad.onnx" 2623a2953f6ff3d2c1e61740c6cdb7168133479b267dfef114a4a3cc5bdd788f

./test [asr1@k-atc12 cpp]$ ./test num_channel_ :1 sample_rate_ :16000 bits_per_sample_:16 num_samples :2419968 num_data_size :4839936 {start:00019456,end:00200192} {start:00202752,end:00258048} {start:00261120,end:00400384} {start:00403456,end:00473600} {start:00477184,end:00506880} {start:00510976,end:00548864} {start:00555520,end:00637952} {start:00642560,end:00686592} {start:00689152,end:00727552} {start:00729600,end:00787456} {start:00790016,end:00826880} {start:00829952,end:00846848} {start:00849920,end:00858112} {start:00863232,end:01068032} {start:01071616,end:01083904} {start:01088000,end:01289216} {start:01295360,end:01311744} {start:01314816,end:01324032} {start:01326592,end:01340928} {start:01357824,end:01378816} {start:01394688,end:01408512} {start:01420288,end:01427968} {start:01432576,end:01484800} {start:01491456,end:01510912} {start:01521152,end:01569280} {start:01578496,end:01609216} {start:01619456,end:01625088} {start:01627648,end:01650176} {start:01655296,end:01676288} {start:01687040,end:01710080} {start:01716224,end:01724928} {start:01731072,end:01750528} {start:01754112,end:01762304} {start:01765888,end:01772544} {start:01777664,end:01790976} {start:01796608,end:01813504} {start:01821184,end:01859072} {start:01873408,end:01906176} {start:01910272,end:01923072} {start:01926144,end:01959936} {start:01967616,end:01989120} {start:02003968,end:02050048} {start:02058752,end:02076160} {start:02094592,end:02114048} {start:02116608,end:02131968} {start:02170880,end:02191872} {start:02195456,end:02211840} {start:02223104,end:02244096} {start:02250240,end:02267648} {start:02272256,end:02303488} {start:02314752,end:02327552}

I check both of onnx model checksum code. They are same. Any clues? Thank you.

NathanJHLee avatar Aug 28 '24 05:08 NathanJHLee