DSP.jl
DSP.jl copied to clipboard
add findpeaks
Introduction
findpeaks
is a function which extracts filtered local maxima based on a few parameters.
It is often needed in data processing related to analysis of 1D spectra or other noisy signal data where finding maxima (or minima) is a routine task.
This function exists in MATLAB (Signal Processing Toolbox) and Python (scipy).
Contents
This PR contains source code, tests, docs changes, a data file and two pictures, all related to the findpeaks
function.
Motivation
It was recently suggested to me to by @cormullion that a repository I created a long time ago would be better placed in the DSP package. I base this assumption on the following active issue: https://github.com/JuliaDSP/DSP.jl/issues/303
Notes
I am certain that the code I am proposing to you for review may be further optimized (possibly through use of lazy iterators and by minimizing the number of passes through the data vector) and its functionality expanded. Sadly, I am not an active Julia user right now nor have the time to implement the necessary changes at the moment.
Also there is another package with similar functionality: https://github.com/halleysfifthinc/Peaks.jl
Thank you for consideration!
Cheers and thanks - I hope (for your sake and mine) that this can make it into the DSP package...
Thanks for the PR, I'll review it soon.
Thanks @tungli for the PR.
I have always used the images.jl findlocalmaxima for this purpose. What would this function provide over this existing implementation?
I think its ok to have some duplication across packages, but we should consider this before merging. findlocalmaxima works in multiple dimensions, but this function seems specific to one dimension? What other differences are there? Images.jl is well maintained and very efficient, I think we would need a good reason to duplicate functionality.
@rob-luke I don't think findlocalmaxima
offers filtering functionality.
In processing noisy data you want to filter local maxima, usually based on some global property.
I've added some pictures that illustrate what I mean.
To do something similar in 2D or higher dimensions is much harder, that's why this is restricted to 1D.
I don't quite understand what you mean by filtering functionality, could you elaborate on what you mean here and be as explicit as possible. But based on your figures I think I understand what you are trying to achieve.
Here are a few example snippets I use to find maxima (or peaks) using Images.jl. I have run them on your data to match your figures
maxima = findlocalmaxima(imfilter(y, KernelFactors.gaussian((3))))
maxima = [getindex(max, 1) for max in maxima]
f = plot(x, y)
scatter!(x[maxima], y[maxima])
or you could use Images.jl mapwindow with an additional line to implement your min_dist
...
maxima = unique(mapwindow(maximum, y, 101, border="replicate"))
idxs = [findfirst(y .== max) for max in maxima]
[i[2] == 1 ? y[idxs[i[1]-1]] < y[idxs[i[1]]] ? idxs[i[1]-1] = idxs[1] : 0 : 0 for i in enumerate([2; diff(idxs)])]
f = plot(x, y)
scatter!(x[newidx], y[newidx])
I am sure @timholy has cleaner examples of how to use mapwindow on one dimensional data, but this should illustrate that peak finding is already implemented in a robust existing package. Images.jl doesn't have the additional arguments you have like min_height, but these are just one liners as illustrated above. There have also been several other packages to produce maxima or minima filters (MinMaxFilter.jl, MaxMinFilter.jl, RollingFilters.jl, LocalFilters.jl, etc) and see nice discussion here. Further, the images.jl code works in multiple dimensions.
But I guess the problem is that people aren't managing to find the Images.jl functions or cant see how it can be used to achieve 1 dimensional filtering (see also https://github.com/JuliaDSP/DSP.jl/issues/343). So maybe it would be handy to point people in the right direction? This could be something for the new website (https://github.com/JuliaDSP/juliadsp.github.io/issues/1#issuecomment-647051107)?
Alternatively I may have misunderstood what you are trying to achieve, so please feel free to point out where I have got mixed up. And where your proposal extends beyond Images.jl. I am not opposed to adding a peak finding function, but I think it needs to be clear how it provides additional functionality over what's capable with Images.jl.
I think this is similar to Matlab's findpeaks
, which allows you to filter by peak prominence and half width etc., which is hard to recreate with just filtering a signal and then finding local maxima in the filtered signal (e.g. see their section on prominence: https://uk.mathworks.com/help/signal/ref/findpeaks.html#buff2t6). Though I don't use this function very often in Matlab, my understanding is that it would be hard to recreate its peak filtering behavior with Images.jl or similar packages. Is that right @tungli ?
Ok so there is some specific behaviour that this PR implements. It would be handy to provide an example which highlights the benefits of this approach and under what situations this method is advantageous over existing implementations, this isn’t evident from the example provided in this PR.
Also, I’m just curious now, maybe I should be using this prominence approach rather than my snippets.
Thanks for sharing your approach with Images.jl @rob-luke and for the links. You are correct - this PR can mostly be replaced by appropriate use of Images.jl with some filters.
As @galenlynch points out the prominence approach is quite powerful - it is also the reason why I implemented the code in this PR.
By tuning this one parameter, findpeaks
(almost always) gives you peaks that you yourself would consider to be peaks by eye (as opposed to noise).
I think there are two questions that have to be decided on:
- Does the peak-finding functionality need to be in a signal processing package as well, given that it is already in Images.jl or other less-known packages?
- Is this the interface DSP.jl wants to provide their users (i.e. filtering through few parameters (prominence, etc..)).
My own thoughts on these are:
Given that in Matlab and Python findpeaks
is found in a signal processing, my guess is people would expect to find it here (as can be extrapolated from the links in the Motivation section of PR).
As for the interface, I find this to be intuitive and easy-to-use for people processing spectra/signal. It may not be the most general or efficient approach but I think it could satisfy users with background in Matlab/Python. Since I am not really familiar with DSP.jl, you guys need to decide whether this fits in with the rest of DSP.jl.
I think it's appropriate, and the interface is intuitive. Thanks for submitting the PR, I am almost finished with my review of it.
Thank you for the review @galenlynch! I really appreciate it. Unfortunately, I am a bit busy these few weeks. With some luck, I will be able to resolve the issues this or the next weekend. I am sorry for the delay.
No worries on the delay, thanks once again for the great PR!
@galenlynch and others, I apologize for the huge delay.
I changed a few things that revolve around collecting additional information about peaks during the search.
This done similar to how scipy does it.
Now a struct
is returned together with the peaks (although I still have to write tests for the struct part).
I felt this change was necessary mainly because of the plateau handling (see comment above -> returning only one point of the plateau would result in loss of information about the plateau) and it is also a nice and, more or less, free benefit to have the peak information available.
I have also included every one of @galenlynch's comments, but this one:
Strange things happen if x is, for example, all zeros. For many of the default filtering values below, I assume using the default value means "do not filter". However, in this (corner) case filtering is still done. I think using nothing as a default value for keyword parameters to indicate that no filtering should be done would make sense.
which I need to think through a bit more. It seems to me that having all zero x
(or even constant x
) would be a bit inconsistent with peak finding. Also, up to now I avoided Nothing
with the choice of defaults in such a way to simulate that no filtering is done (although the filter is run and does not (or at least should not) change anything).
I still have doubts about the plateau handling, because the plateau filtering gets cancelled out by the threshold filtering and vice versa. This also seems to be the case in scipy.
Perhaps a better version would treat every plateau as a single peak.
For this I think the modification will need to be the following:
- change the order of filtering and find plateaus first
- after plateaus are found, change the internal data representation of
y
such that there are no consequent repeated values (an example - return only the
PeakInfo
struct (no peaks)
The last point gets rid of the where-in-the-plateau-is-the-peak? issue - every peak will be treated as a range (included in the PeakInfo
)
Minimal distance combined with plateaus will also probably be more meaningful.
As an interested party (author of Peaks.jl), I will throw in my (biased) 2-cents:
If I am correctly following the functionality being added/requested, the goal is to:
- find local maxima of a range of
min_dist
- filter maxima by peak prominence, height, and threshold
I will note that the latest Peaks.jl version handles min_dist
, plateaus (MATLAB behavior), prominence calculation and filtering, and supports the presence of NaN
and missing
values.
Regardless, on the subject of API bikeshedding, I think that factoring out the different features (peak finding, calculation and filtering by prominence, height, and threshold) into separate functions would present a simpler and more flexible interface (eg something along the lines of findpeaks
, peakprominence
, peakheight
, peakthreshold
).
I also think that structs being returned would unnecessarily complicate the user's handling of the returned data. In most cases, users will only need one or two of the fields in PeakInfo
; in particular, plateaus will not be a concern for many sources of data. Separate functions (eg, in the case of plateaus, plateau_extents
or similar) for each feature would allow greater control by the user over what is being returned.
I'm also curious how the performance compares to equivalent Peaks.jl or Images.jl functions; somewhat dated benchmarking I have previously done showed similar performance scaling between Peaks.jl and Images.jl, with a slight edge towards my package.
This pull request is very interesting. Is there any way to solve the 2D case? I'm looking for the Julia version of skimage.feature.peak_local_max but it seems this may not exist yet..?
using Images
help?> findlocalmaxima
search: findlocalmaxima findlocalminima
findlocalmaxima(img, [region, edges]) -> Vector{CartesianIndex}
Returns the coordinates of elements whose value is larger than all of their immediate neighbors. region is a list of dimensions to consider. edges is a boolean specifying whether to include the first
and last elements of each dimension, or a tuple-of-Bool specifying edge behavior for each dimension separately.
This seems like interesting functionality. I am wondering when it may be available from within DSP.jl?
Would it make sense to update the documentation in DSP.jl to refer to the Peaks.jl package?
I agree that referencing Peaks.jl in DSP.jl documentation would be useful.