This PR adds a feature to estimate the noise variance of a signal. This is useful in many thresholding algorithms such as 'VisuShrink' and 'RiskShrink'. Below is an example using this function to find a good threshold value for a contrived data set with noise added. This shows how choosing a threshold too high or too low will not perform as well. Inspired by Issue #394

There are remaining details to work out yet, I don't know what the most correct thing to do on line 319 of _thresholding.py, as we do not have scipy as a dependency.

I am sure there are other little things as well as a better test set and docs, and a final decision on the name of the function and where it will live.

9plots_threshold_doppler Middle right figure shows the correct threshold applied, top is too high and bottom is too low.

import matplotlib.pyplot as plt
import pywt

N = 10000
sigmaIn = 0.25
data = pywt.data.demo_signal('doppler', n=N)
data_noised = data + np.random.normal(0,sigmaIn,N)

wavelet = 'db4'
coeffs = pywt.wavedecn(data_noised, wavelet=wavelet)
# Detail coefficients at each decomposition level
dcoeffs = coeffs[1:]
detail_coeffs = dcoeffs[-1]['d']
sigma = pywt.estimate_sigma(detail_coeffs)
print(f'sigma(estimated) = {sigma}\n'
      f'sigma(real)      = {sigmaIn}\n'
      f'%error           = {(np.abs(sigma-sigmaIn)/sigmaIn)*100:0.2f}%')
# Method for finding threshold, 'VisuShrink'
threshold = sigma*np.sqrt(2*np.log(data_noised.size))

denoised_datas = []
#Denoise the data using the 'correct' threshold as well as one too high
#and one too low, it can be seen that too high will distort the signal
#while too low will leave in too much of the noise.

for thresh in [threshold*2, threshold, threshold/2]:
    denoised_detail = [{key: pywt.threshold(level[key],
                                value=thresh) for key in level}
                                for level in dcoeffs]
    denoised_coeffs = [coeffs[0]] + denoised_detail
    denoised_datas.append(pywt.waverecn(denoised_coeffs, wavelet))

for i, denoised_data in enumerate(denoised_datas):
    plt.subplot(3,3,1+3*i)
    plt.plot(data)
    plt.subplot(3,3,2+3*i)
    plt.plot(data_noised)
    plt.subplot(3,3,3+3*i)
    plt.plot(denoised_data)
plt.tight_layout()
plt.show()``` #

Aug 03 '18 04:08 micha2718l

Codecov Report

Merging #395 into master will decrease coverage by 0.09%. The diff coverage is 64.7%.

@@            Coverage Diff            @@
##           master     #395     +/-   ##
=========================================
- Coverage   84.52%   84.43%   -0.1%     
=========================================
  Files          22       22             
  Lines        3600     3616     +16     
  Branches      627      630      +3     
=========================================
+ Hits         3043     3053     +10     
- Misses        489      493      +4     
- Partials       68       70      +2

Impacted Files	Coverage Δ
pywt/_thresholding.py	`82.27% <64.7%> (-5.03%)`	:arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d4be78d...407985b. Read the comment docs.

Aug 04 '18 18:08 codecov-io

Hi @micha2718l. Over in #394, @rgommers suggested renaming to estimate_noise or estimate_noiselevel. I think estimate_noise sounds reasonable and may be clearer to users than estimate_sigma. What do you think?

Oct 15 '19 15:10 grlee77

I agree that estimate_noise is probably best. I'll go ahead and change it.

Hi @micha2718l. Over in #394, @rgommers suggested renaming to estimate_noise or estimate_noiselevel. I think estimate_noise sounds reasonable and may be clearer to users than estimate_sigma. What do you think?

Oct 15 '19 18:10 micha2718l

@rgommers @grlee77 Hi All! Sorry for the multi-year delay on this, I had some time and was reminded of this work. Would someone with access please enable the CI workflow here to help with tests (I don't think it existed in this state when I first started on this)/maybe it doesn't need to be enabled, if not then let me know what the next steps need to be.

Mar 06 '24 19:03 micha2718l

Hi @micha2718l, thanks for getting back to this. I have just hit the button to have CI run again. Note that this is needed on each new push. To avoid the problem, please feel free to submit a separate PR with for example a typo fix or trivial rewording - I'd be happy to merge such a PR straight away, and once you have a commit in the default branch, GitHub will trigger CI on future pushes to this PR without needing approval.

Mar 08 '24 09:03 rgommers

add estimate_sigma function to threshold features

Codecov Report