add estimate_sigma function to threshold features
This PR adds a feature to estimate the noise variance of a signal. This is useful in many thresholding algorithms such as 'VisuShrink' and 'RiskShrink'. Below is an example using this function to find a good threshold value for a contrived data set with noise added. This shows how choosing a threshold too high or too low will not perform as well. Inspired by Issue #394
There are remaining details to work out yet, I don't know what the most correct thing to do on line 319 of _thresholding.py, as we do not have scipy as a dependency.
I am sure there are other little things as well as a better test set and docs, and a final decision on the name of the function and where it will live.
Middle right figure shows the correct threshold applied, top is too high and bottom is too low.
import matplotlib.pyplot as plt
import pywt
N = 10000
sigmaIn = 0.25
data = pywt.data.demo_signal('doppler', n=N)
data_noised = data + np.random.normal(0,sigmaIn,N)
wavelet = 'db4'
coeffs = pywt.wavedecn(data_noised, wavelet=wavelet)
# Detail coefficients at each decomposition level
dcoeffs = coeffs[1:]
detail_coeffs = dcoeffs[-1]['d']
sigma = pywt.estimate_sigma(detail_coeffs)
print(f'sigma(estimated) = {sigma}\n'
f'sigma(real) = {sigmaIn}\n'
f'%error = {(np.abs(sigma-sigmaIn)/sigmaIn)*100:0.2f}%')
# Method for finding threshold, 'VisuShrink'
threshold = sigma*np.sqrt(2*np.log(data_noised.size))
denoised_datas = []
#Denoise the data using the 'correct' threshold as well as one too high
#and one too low, it can be seen that too high will distort the signal
#while too low will leave in too much of the noise.
for thresh in [threshold*2, threshold, threshold/2]:
denoised_detail = [{key: pywt.threshold(level[key],
value=thresh) for key in level}
for level in dcoeffs]
denoised_coeffs = [coeffs[0]] + denoised_detail
denoised_datas.append(pywt.waverecn(denoised_coeffs, wavelet))
for i, denoised_data in enumerate(denoised_datas):
plt.subplot(3,3,1+3*i)
plt.plot(data)
plt.subplot(3,3,2+3*i)
plt.plot(data_noised)
plt.subplot(3,3,3+3*i)
plt.plot(denoised_data)
plt.tight_layout()
plt.show()``` #
Codecov Report
Merging #395 into master will decrease coverage by
0.09%. The diff coverage is64.7%.
@@ Coverage Diff @@
## master #395 +/- ##
=========================================
- Coverage 84.52% 84.43% -0.1%
=========================================
Files 22 22
Lines 3600 3616 +16
Branches 627 630 +3
=========================================
+ Hits 3043 3053 +10
- Misses 489 493 +4
- Partials 68 70 +2
| Impacted Files | Coverage Δ | |
|---|---|---|
| pywt/_thresholding.py | 82.27% <64.7%> (-5.03%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update d4be78d...407985b. Read the comment docs.
Hi @micha2718l. Over in #394, @rgommers suggested renaming to estimate_noise or estimate_noiselevel. I think estimate_noise sounds reasonable and may be clearer to users than estimate_sigma. What do you think?
I agree that estimate_noise is probably best. I'll go ahead and change it.
Hi @micha2718l. Over in #394, @rgommers suggested renaming to
estimate_noiseorestimate_noiselevel. I thinkestimate_noisesounds reasonable and may be clearer to users thanestimate_sigma. What do you think?
@rgommers @grlee77 Hi All! Sorry for the multi-year delay on this, I had some time and was reminded of this work. Would someone with access please enable the CI workflow here to help with tests (I don't think it existed in this state when I first started on this)/maybe it doesn't need to be enabled, if not then let me know what the next steps need to be.
Hi @micha2718l, thanks for getting back to this. I have just hit the button to have CI run again. Note that this is needed on each new push. To avoid the problem, please feel free to submit a separate PR with for example a typo fix or trivial rewording - I'd be happy to merge such a PR straight away, and once you have a commit in the default branch, GitHub will trigger CI on future pushes to this PR without needing approval.