crepe icon indicating copy to clipboard operation
crepe copied to clipboard

Viterbi algorithm does not apply to activation probabilities

Open sannawag opened this issue 4 years ago • 0 comments

I would like to use the output of Crepe to determine whether singer is active versus silent at the perceptual level. That should change at the level of seconds, not milliseconds. Setting a hard threshold based on confidence, though, results in a quick alternation between the two states. The alternation shows in the thick vertical lines in the plots below.

Viterbi would be a straightforward approach to smoothing this out. The current version, though, only applies smoothing to the pitch. I wrote an extension and added it to a pull request in case it would be useful for others: https://github.com/marl/crepe/pull/26.

Screen Shot 2020-06-02 at 4 59 50 PM Screen Shot 2020-06-02 at 5 15 31 PM

Code for this plot:

import csv
import matplotlib.pyplot as plt
import numpy as np

f0 = []
conf = []
thresh = 0.5

with open('MUSDB18HQ/train/Music Delta - Hendrix/vocals.f0.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            f0.append(float(row[1]))
            conf.append(float(row[2]))
            line_count += 1
    print(f'Processed {line_count} lines.')

voiced = [1 if c > thresh else 0 for c in conf]
# plt.plot(np.array(f0) * np.array(voiced))
plt.plot(np.array(voiced))
plt.show()

sannawag avatar Jun 08 '20 02:06 sannawag