SincNet
SincNet copied to clipboard
Is the offset-by-one error on purpose?
According to this line https://github.com/mravanelli/SincNet/blob/d64244991324f96d77add11dc86939a7a81ae14d/compute_d_vector.py#L215
When wlen = 200 sample points
and wshift = 10 sample points
, (I'm aware that the 200 and the 10 refer to millisecond in the paper), with a audio signal of length 210 sample points
, this would produce a tensor with its first dim being int((210 - 200) / 10) == 1
, while this signal can produce two examples, with range [0, 200)
being the first one, and range [10, 210)
being the second one.
The compute_d_vector.py
discards the second one, is this on purpose, or it's an offset-by-one error?
I'm asking this, because I observed that, the "paper version" has a slightly lower mean when comparing different audios in data_lists/TIMIT_test.scp
using cosine similarity.
"two examples version" "paper version ("one example")"
mean 0.74994516 0.7498444
std 0.08081242 0.080853514