athena
athena copied to clipboard
The delta pitch feature seems to be strange
Hi and thanks for the fascinating work! I am using Athena to extract MFCC and pitch without the use of Kaldi. It went smoothly, but when I inspect the value of delta pitch feature (the last dimension of the default 3-dim pitch feature), I got confused. Here is what I found:
- Although the pitch-feature and the warped NCCF (POV feature) values seems a little bit different from those extracted by Kaldi, the trajectory is similar and the difference is not that huge. This is OK, but the delta-pitch feature is very different from Kaldi output, in default settings. I did not find any difference in Athena default pitch settings and Kaldi. The delta pitch extracted by Athena looks like a noise sequence.
- Hence I lowered down the standard deviation of noise added to delta pitch even to 0. Then Athena outputs a sequence of almost all zeros. Therefore I believe there is something wrong with the source code.
Below is some results.
This plot is the 4-dim pitch extracted by Kaldi.
This plot is the 3-dim pitch extracted by Athena (in default settings).
After setting the delta_pitch_noise_stddev to 0, I get the result below.
Hi, I tried extracting the pitch feature using the athena-transform example audio: 'examples/sm1_cln.wav'. there are some configurations as follows: 'window_length': 0.025, 'soft_min_f0': 10.0, 'delta_pitch_noise_stddev':0, and the output looks like usual: Transform: [[3.8811225e-02 3.0000305e-01 3.5762787e-07] [6.7564729e-03 3.0000973e-01 3.5762787e-07] [2.4553644e-02 3.0001450e-01 3.5762787e-07] [2.4535857e-02 3.0002213e-01 3.5762787e-07] [3.4553111e-02 3.0003071e-01 3.5762787e-07] [4.2932931e-02 3.0004215e-01 3.5762787e-07]]
@JianweiSun007 Thanks for experimenting this. I tried this as well, and the result is close to yours.
But there's still a problem in this result. As we can see, the last dimension of pitch output is very close to 0 (with 1e-7 order). As this wav obviously has pitch variations, the delta pitch feature should not be such a small value all the time. And this result is much different to Kaldi, in terms of delta pitch.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue is closed. You can also re-open it if needed.