mne-python icon indicating copy to clipboard operation
mne-python copied to clipboard

[ENH] Add new ERP measures

Open withmywoessner opened this issue 1 year ago • 58 comments

Reference issue

Adds features from #7848

What does this implement/fix?

  • ~~Mean amplitude~~
  • Area amplitude (all positive, numeric integration)
  • Fractional area latency
  • Fractional peak latency

EDIT: Also adds function to find erp peak.

withmywoessner avatar Feb 08 '24 20:02 withmywoessner

@larsoner @drammock I went ahead and created a pull request. There's nothing to review yet. Do you think scipy.integrate.trapezoid is fine to calculate the areas? Or do you know of something faster.

Also, I saw another issue related to Type Hints, do you think these are still useful even though these are 'global' functions not called from Evoked?

withmywoessner avatar Feb 08 '24 20:02 withmywoessner

Hi all, I also wanted to ask if the get_mean(evoked) function is even necessary. Can't you just do np.evoked.data.mean() and that will be the same as integrate.trapz(data, times) when the sampling rate is uniform. Is it ever the case in MEG/EEG research that the sampling rate would be non-uniform?

edit: I guess it would be nice to not have to use other operations to restrict the time interval. So in that case maybe I should just remove (or update) the tutorial that mentions using .data.mean()

withmywoessner avatar Feb 09 '24 00:02 withmywoessner

In MNE-Python sample rates are always uniform

guess it would be nice to not have to use other operations to restrict the time interval. So in that case maybe I should just remove (or update) the tutorial that mentions using .data.mean()

Yeah if it's as easy as evoked.copy().crop(x, y).data.mean() I'd rather teach people this one-liner!

larsoner avatar Feb 12 '24 16:02 larsoner

Do you think scipy.integrate.trapezoid is fine to calculate the areas? Or do you know of something faster.

trapz should be plenty fast, I think

I saw another issue related to Type Hints

Currently the plan is to add type hints to whole submodules at once (e.g., do it for mne.stats, then for mne.minimum_norm, etc), rather than adding them piecemeal to each bit of code that gets changed. We can enable mypy testing for each submodule separately, so that will ensure the type hints stay accurate/complete once they're added. If we add them one function at a time, it's hard to test them because of all the unannotated functions surrounding them.

drammock avatar Feb 12 '24 20:02 drammock

Hey @drammock, thanks for helping me with this!

I know this is draft mode still, but I wanted to ask why all the other files within stats are at the root level (stats/_adjacency.py, stats/cluster_level.py, etc), but erp is going within a subfolder (stats/erp/_erp.py)? Is there a particular reason you're choosing to organize it that way?

I thought that's how @larsoner mentioned it should be structured: https://github.com/mne-tools/mne-python/issues/7848#issuecomment-1924014304

withmywoessner avatar Feb 12 '24 20:02 withmywoessner

I thought that's how @larsoner mentioned it should be structured

ah OK. He probably had a good reason :)

drammock avatar Feb 12 '24 21:02 drammock

@drammock @larsoner Is it better to match the functionality of get_peak by returning only area of the channel with the highest area for the get_area function? This may be more unintuitive for things like get_fractional_area_latency which would return a latency and I do not think there's any reason one would want just the highest latency.

For that reason, I am leaning more towards all the new functions just returning the new measures for all channels even get_area

withmywoessner avatar Feb 16 '24 06:02 withmywoessner

I am not familiar with what people actually do for ERPs -- for example averaging over a subset of channels then computing area, computing it for a single channel, computing it for multiple channels, etc.

It's a reasonable starting point to match get_peak at least. If people really want it for all channels they can always do a one liner like:

areas = {ch_name: evoked.copy().pick([ch_name]).get_area() for ch_name in evoked.ch_names}

or similar, too.

larsoner avatar Feb 16 '24 13:02 larsoner

I am not familiar with what people actually do for ERPs

same here; I've read some papers (but not recently) and don't have a complete grasp of the ERP literature. I tend to rank the goal state like this:

  1. make it easy to get measures that are super commonly used
  2. make it possible to get less commonly used measures
  3. make our API as consistent as possible while respecting (1) and (2)

Does knowing that resolve any questions in your mind @withmywoessner? If not, we could ping a couple other folks to weigh in on what should be included in "super common" vs "less common"

drammock avatar Feb 16 '24 14:02 drammock

@drammock @larsoner Thank you for the responses. I also think it's important to match existing functionality to make it less confusing, but my main question was, if we are going in the direction of returning a single value for all channels, what single value should I return for latency = get_fractional_latency. I guess it makes the most sense to return the SMALLEST latency which kinda matches the functionality of get_peak.

withmywoessner avatar Feb 16 '24 16:02 withmywoessner

if we are going in the direction of returning a single value for all channels, what single value should I return for latency = get_fractional_latency

Sorry if I've missed the discussion/resolution of this question, but are we leaning in the direction of returning a single value across all channels? I know that's what evoked.get_peak() does, and there was some discussion of the question here and in the 3 comments following that one. There I said basically "just use .copy().pick() first", but in hindsight, I'm feeling now like if we're going to go to the trouble of adding functions that are useful for ERP researchers, we should do what they would expect / find most useful. My (possibly wrong or outdated?) impression is that ERP researchers often look at each channel separately. If that's right, shouldn't these new functions operate separately on each channel?

From an API perspective, it's easy enough to have a switch like channelwise=True|False or average=True|False or combine=None|"mean"|"median" that lets the user decide whether to aggregate across channels or not --- but let's only offer that option if it's something the ERP researchers want / will find useful.

A further question is that if we do offer to aggregate, does that mean "combine channels first, then compute the metric" or does it mean "compute metric for each channel, and then combine the per-channel metrics"? If the latter makes more sense, we could at least consider not building in the aggregation (I think we can return a numpy array or pandas series and expect that people can use its .mean() method if they want to).

I guess it makes the most sense to return the SMALLEST latency which kinda matches the functionality of get_peak.

When dealing with latencies, yeah, returning the smallest could be one sensible way to aggregate across channels. So there are further questions around how each metric should be aggregated (is there one obviously right choice for each metric, or do we need to let the users pick? etc). But the question "should we aggregate at all" needs to be decided first.

drammock avatar Feb 16 '24 18:02 drammock

Thanks for the in-depth response @drammock! I am also confused on what approach is best. I did some looking and in this tutorial by Steven J. Luck who created ERPLAB. He mentions that it is common to average across channels (link):

However, it’s often valuable to include the data from multiple channels. One way to do that is to measure from multiple different channels and include channel as a factor in the statistical analysis. However, that adds another factor to the analysis, which increases the number of p values and therefore increases the probability of getting bogus-but-significant effects. Also, interactions between an experimental manipulation and electrode site are difficult to interpret (Urbach & Kutas, 2002, 2006). In most cases, I therefore recommend averaging the waveforms across the channels where the component is large, creating a cluster, and then obtaining the amplitude or latency scores from the cluster waveform. Averaging across channels in this manner tends to produce a cleaner waveform, which decreases the measurement error (as long as we don’t include channels where the ERP effect is substantially smaller). Also, it avoids the temptation to “cherry-pick” the channel with the largest effect.

However, talking to a member of my lab he stated that they prefer to find the peaks/areas of each channel then do a correction for multiple comparisons if they end up doing some statistical analysis.

Sorry if I've missed the discussion/resolution of this question, but are we leaning in the direction of returning a single value across all channels?

Personally, I am leaning more towards returning all the channels even though that does not match the functuality of get_peak(). I think it would aggregation much easier.

A further question is that if we do offer to aggregate, does that mean "combine channels first, then compute the metric" or does it mean "compute metric for each channel, and then combine the per-channel metrics"? If the latter makes more sense, we could at least consider not building in the aggregation (I think we can return a numpy array or pandas series and expect that people can use its .mean() method if they want to).

I think it would be good to include an option to aggregate across channels before computing measures from what I have read. If people are confused on how to do the later method of averaging, I could always write a tutorial later even if it is quite simple.

Also, I know this may be a bad idea, but would it be horrible if we made another get_peak function inside of the erp module that included the extra features like aggregating across channels? I guess these parameters could be added to the existing function, it just bothers me that the current function is called by an Evoked object while the ones I am making will be 'global' functions.

withmywoessner avatar Feb 16 '24 20:02 withmywoessner

If Steve Luck recommends (at least in some cases) to aggregate across channels before computing these metrics, then we should make it easy to do that. But I do lean toward making it optional via average=True|False or similar (as suggested above). I think both average and combine are parameters used elsewhere in our codebase for similar needs, so I'm not 100% sure which one to pick: combine is the most flexible as it doesn't bake in a single aggregation method, but if everyone always only uses mean anyway, we might as well just have a simple boolean like average=False|True.

would it be horrible if we made another get_peak function inside of the erp module that included the extra features like aggregating across channels?

horrible because of name clash? I.e., is it acceptable to have both evoked.get_peak() method and mne.stats.erp.get_peak() function? It's not ideal, but I feel like even if we called the function something else like get_erp_peak(), there's still a risk that people might accidentally discover/try evoked.get_peak() and be sad that it doesn't easily do what they want. So distinct names doesn't completely solve that problem. That said, calling all of these functions get_erp_*() might be a good idea anyway --- it would reinforce that these are (probably) the measures you want and are familiar with if you're an ERP researcher. I'm open to other opinions/ideas about naming too though.

drammock avatar Feb 16 '24 20:02 drammock

horrible because of name clash? I.e., is it acceptable to have both evoked.get_peak() method and mne.stats.erp.get_peak() function? It's not ideal,

Yes that's what I was referencing.

@drammock What if we have a parameter compute (This is probably not the best name so other suggestions would be very much appreciated maybe compute_on??) and compute specifies what sort of data the metric is computed on and what channels it returns.

So for instance mne.stats.erp.get_peak() would have compute='maximum' by default so that it matches the functionality of evoked.get_peak() that way users would not get confused because they would do the same thing by default. compute would also have options for all and average. mne.stats.erp.get_fractional_area_latency() could also have a special option for compute like compute='earliest' that sort of matches the way evoked.get_peak() works. However, this may be too much to place on the shoulders of one lone variable :fearful:

withmywoessner avatar Feb 22 '24 04:02 withmywoessner

Hey @larsoner @drammock just pinging here to see if you had any thoughts on my suggested API implementation, no pressure to look at this now though. Thank you!

withmywoessner avatar Feb 28 '24 19:02 withmywoessner

What if we have a parameter compute [...] and compute specifies what sort of data the metric is computed on and what channels it returns.

I'm struggling to keep track of what all the desired computations are, which makes it hard to weigh on on whether the get_peak(..., compute="whatever") is a good way to expose all the desired possiblities. Can you make a list or table showing all the computations you want to support, and then we can work from there to decide how best to group those computations into combinations of functions and parameter values? Example:

  • get fractional peak latency of each channel separately
  • get fractional peak latency after averaging channels
  • get fractional area latency ...
  • get mean amplitude over a window ...

Side note: remember that if it is best/easiest from a UX perspective for the user to have 10 distinct functions with clear, descriptive function names and few or no parameters to supply, then we can build that as the public API --- even if under the hood they all just wrap to a (more complicated) private function (if that's the most efficient / least redundant way to actually do the computations).

drammock avatar Mar 01 '24 21:03 drammock

I'm struggling to keep track of what all the desired computations are, which makes it hard to weigh on on whether the get_peak(..., compute="whatever") is a good way to expose all the desired possiblities. Can you make a list or table showing all the computations you want to support...

Sure, no problem @drammock! Let me know if this makes it more clear.

  • get_fractional_area_lat(fractional_area, tmin, tmax, mode, compute...)

    • get_fractional_area_lat(fractional_area=.9, tmin, tmax, mode, compute='all'...)
    • get_fractional_area_lat(fractional_area, tmin, tmax, mode, compute='average'...)
    • get_fractional_area_lat(fractional_area, tmin, tmax, mode, compute='earliest'...)
    • get_fractional_area_lat(fractional_area, tmin, tmax, mode='intg', compute...)
    • get_fractional_area_lat(fractional_area, tmin, tmax, mode='abs', compute...)
    • get_fractional_area_lat(fractional_area, tmin, tmax, mode='neg', compute...)
    • get_fractional_area_lat(fractional_area, tmin, tmax, mode='pos', compute...)
  • get_area(tmin, tmax, mode, compute...)

    • get_area(tmin, tmax, mode, compute='all'...)
    • get_area(tmin, tmax, mode, compute='average'...)
    • get_area(tmin, tmax, mode, compute='maximum'...)
    • get_area(tmin, tmax, mode='intg', compute...)
    • get_area(tmin, tmax, mode='pos', compute...)
    • get_area(tmin, tmax, mode='neg', compute...)
    • get_area(tmin, tmax, mode='abs', compute...)
  • get_peak(tmin, tmax, mode, compute...)

    • get_peak(tmin, tmax, mode, compute='average'...)
    • get_peak(tmin, tmax, mode, compute='all'...)
    • get_peak(tmin, tmax, mode, compute='maximum'...)
    • get_peak(tmin, tmax, mode='abs', compute...)
    • get_peak(tmin, tmax, mode='pos', compute...)
    • get_peak(tmin, tmax, mode='neg', compute...)
  • get_fractional_peak_lat(fractional_peak, tmin, tmax, mode, compute, return_offset...)

    • get_fractional_peak_lat(fractional_peak=.9, tmin, tmax, mode, compute='all'...)
    • get_fractional_peak_lat(fractional_peak, tmin, tmax, mode, compute='average'...)
    • get_fractional_peak_lat(fractional_peak, tmin, tmax, mode, compute='earliest'...)
    • get_fractional_peak_lat(fractional_peak, tmin, tmax, mode='abs', compute...)
    • get_fractional_peak_lat(fractional_peak, tmin, tmax, mode='neg', compute...)
    • get_fractional_peak_lat(fractional_peak, tmin, tmax, mode='pos', compute...)
    • get_fractional_peak_lat(fractional_peak, tmin, tmax, compute, return_offset=True...)

Areas mode='intg' -> compute across normal integral

Amps and areas mode='pos' -> compute across positive vales mode='neg' -> compute across negative values mode='abs' -> compute across absolute value of values

compute='all' -> compute metric for each channel then return all channels compute='avg' -> Average across all channels in Evoked object then compute the metric on the average

For amps and areas compute='maximum' -> Compute the metric across all channels but only return the absolute value maximum of the values considered (neg, pos). This is probably confusing and should be changed. I just included this to match the functionality of Evoked.get_peak()

For latencies compute='earliest' -> Return earliest Compute the metric across all channels but only return the earliest latency

return_offset -> return onset as well as the 'offset' - the first time after the peak that the ERP drops to X% as large as the peak. image

withmywoessner avatar Mar 01 '24 22:03 withmywoessner

fractional_peak will be renamed to fractional_amp

withmywoessner avatar Mar 01 '24 22:03 withmywoessner

Hey @drammock! Just pinging here to see if you have a chance to look over my suggestions because I saw you edited one of my posts. Let me know if I am pinging too much. Thank you!

withmywoessner avatar Mar 08 '24 19:03 withmywoessner

Hey @drammock! Just pinging here to see if you have a chance to look over my suggestions because I saw you edited one of my posts. Let me know if I am pinging too much. Thank you!

Edited other post because it cross-ref'd the wrong issue --- I look quickly at (nearly) every new issue/PR as it comes in if I can, just to mentally triage its urgency/difficulty and chime in as appropriate. Rest assured that this one is still on my radar, but there's a lot going on right now and I haven't had to wrap my head around the (long!) list of supported functionality. One thing slowing me down is that I asked for / expected a list of prose descriptions of the functionality you want to support (as a way of helping figure out what the API should look like). What you wrote is a mock-up of the API call for each case.

drammock avatar Mar 08 '24 21:03 drammock

Hey @drammock! Just pinging here to see if you have a chance to look over my suggestions because I saw you edited one of my posts. Let me know if I am pinging too much. Thank you!

Edited other post because it cross-ref'd the wrong issue --- I look quickly at (nearly) every new issue/PR as it comes in if I can, just to mentally triage its urgency/difficulty and chime in as appropriate. Rest assured that this one is still on my radar, but there's a lot going on right now and I haven't had to wrap my head around the (long!) list of supported functionality.

Okay, I understand, thanks for letting me know!

One thing slowing me down is that I asked for / expected a list of prose descriptions of the functionality you want to support (as a way of helping figure out what the API should look like). What you wrote is a mock-up of the API call for each case.

I placed prose descriptions of every keyword and value that keyword can take at the bottom, I thought that would be easier to see everything laid out in the most succinct format instead of just having a prose description of each so that way if I miss something it is easier to see.

If I did a prose description there would be 12 possible combinations for some of them and I thought that would be hard to parse.

EDIT: I'Il add a prose description below each.

withmywoessner avatar Mar 08 '24 21:03 withmywoessner

  • get_fractional_area_lat(...) operations:

    • get_fractional_area_lat(fractional_area=.9, tmin, tmax, mode, compute='all'...)

      • Compute the latency where 90% of the area is reached for each channel, then return all channels.
    • get_fractional_area_lat(fractional_area, tmin, tmax, mode, compute='average'...)

      • Compute the average latency across all channels where a specified fractional area is reached.
    • get_fractional_area_lat(fractional_area, tmin, tmax, mode, compute='earliest'...)

      • Compute the earliest latency across all channels where a specified fractional area is reached.
    • get_fractional_area_lat(fractional_area, tmin, tmax, mode='intg', compute...)

      • Compute the latency for reaching a fractional area by integrating over the signal from tmin to tmax.
    • get_fractional_area_lat(fractional_area, tmin, tmax, mode='abs', compute...)

      • Compute the latency for reaching a fractional area across the absolute values of the signal.
    • get_fractional_area_lat(fractional_area, tmin, tmax, mode='neg', compute...)

      • Focus on negative values of the signal to compute the latency for reaching a fractional area.
    • get_fractional_area_lat(fractional_area, tmin, tmax, mode='pos', compute...)

      • Focus on positive values of the signal to compute the latency for reaching a fractional area.
  • get_area(...) operations:

    • get_area(tmin, tmax, mode, compute='all'...)

      • Compute the area under the curve for each channel between tmin and tmax, then return all channels.
    • get_area(tmin, tmax, mode, compute='average'...)

      • Average the signal across all channels, then compute the area between tmin and tmax.
    • get_area(tmin, tmax, mode, compute='maximum'...)

      • Compute the area for all channels, but only return the maximum absolute value of these areas.
    • get_area(tmin, tmax, mode='intg', compute...)

      • Compute the area through normal integration between tmin and tmax.
    • get_area(tmin, tmax, mode='pos', compute...)

      • Compute the area only considering positive values of the signal.
    • get_area(tmin, tmax, mode='neg', compute...)

      • Compute the area only considering negative values of the signal.
    • get_area(tmin, tmax, mode='abs', compute...)

      • Compute the area considering the absolute values of the signal.
  • get_peak(...) operations:

    • get_peak(tmin, tmax, mode, compute='average'...)

      • Compute the average peak value across all channels within the specified timeframe.
    • get_peak(tmin, tmax, mode, compute='all'...)

      • Compute the peak for each channel between tmin and tmax, then return all channels.
    • get_peak(tmin, tmax, mode, compute='maximum'...)

      • Compute the peak for all channels, but only return the maximum absolute value peak.
    • get_peak(tmin, tmax, mode='abs', compute...)

      • Compute the peak considering the absolute values of the signal within the timeframe.
    • get_peak(tmin, tmax, mode='pos', compute...)

      • Compute the peak focusing on positive values of the signal.
    • get_peak(tmin, tmax, mode='neg', compute...)

      • Compute the peak focusing on negative values of the signal.
  • get_fractional_peak_lat(...) operations:

    • get_fractional_peak_lat(fractional_peak=.9, tmin, tmax, mode, compute='all'...)

      • Compute the latency to reach 90% of the peak for each channel and return all results.
    • get_fractional_peak_lat(fractional_peak, tmin, tmax, mode, compute='average'...)

      • Compute the average latency across all channels to reach a specified fractional peak.
    • get_fractional_peak_lat(fractional_peak, tmin, tmax, mode, compute='earliest'...)

      • Determine the earliest latency across all channels to reach a specified fractional peak.
    • get_fractional_peak_lat(fractional_peak, tmin, tmax, mode='abs', compute...)

      • Focused on absolute values to compute the latency for reaching a fractional peak.
    • get_fractional_peak_lat(fractional_peak, tmin, tmax, mode='neg', compute...)

      • Focus on negative values for computing the latency to a fractional peak.
    • get_fractional_peak_lat(fractional_peak, tmin, tmax, mode='pos', compute...)

      • Focus on positive values for computing the latency to a fractional peak.
    • get_fractional_peak_lat(fractional_peak, tmin, tmax, compute, return_offset=True...)

      • Compute the latency to a fractional peak and return both the onset and the 'offset' - the time after the peak when the signal drops to a specified fraction of the peak value.

Areas mode='intg' -> compute across normal integral

Amps and areas mode='pos' -> compute across positive vales mode='neg' -> compute across negative values mode='abs' -> compute across absolute value of values

compute='all' -> compute metric for each channel then return all channels compute='avg' -> Average across all channels in Evoked object then compute the metric on the average

For amps and areas compute='maximum' -> Compute the metric across all channels but only return the absolute value maximum of the values considered (neg, pos). This is probably confusing and should be changed. I just included this to match the functionality of Evoked.get_peak()

For latencies compute='earliest' -> Return earliest Compute the metric across all channels but only return the earliest latency

return_offset -> return onset as well as the 'offset' - the first time after the peak that the ERP drops to X% as large as the peak. image

withmywoessner avatar Mar 08 '24 22:03 withmywoessner

I placed prose descriptions of every keyword and value that keyword can take at the bottom

Yes, I saw that. It requires scrolling/scanning back and forth between the API descriptions and the "key".

I thought that would be easier to see everything laid out in the most succinct format instead of just having a prose description of each so that way if I miss something it is easier to see.

Could you do what you need to do locally to make sure you've included all possibilities, rather than imposing that structure here?

If I did a prose description there would be 12 possible combinations for some of them and I thought that would be hard to parse.

I appreciate the "trying to be helpful" aspect of this choice. But so there is no further confusion: it would be easier for me to see descriptions in prose when I'm trying to figure out what an API should look like. By describing what you want to do already in the form of an API makes that task harder for me, because I have to first "un-api" your proposal in order to imagine / visualize what other possible APIs might look like. Feel free to use nested bullets if that helps, or comments like "each of the next 5 bullets also should have a way to specify a restricted time window over which to operate" or whatever --- in other words you don't necessarily need to spell out every detail on every line, if it helps clarity.

drammock avatar Mar 08 '24 22:03 drammock

ALL functions will have parameters to restrict the time window

  • get_fractional_area_lat(...) operations:

    • Each of these three also have parameters to compute over/using integral, positive, negative, or absolute values of the signal.

      • Compute the latency where X% of the area is reached for each channel, then return all channels.

      • Compute the average latency across all channels where a specified fractional area is reached by first averaging the signal across all channels.

      • Compute the earliest latency across all channels where a specified fractional area is reached by finding a latencies than returning the earliest.

    • Each of the next, also have parameters that describes what channels to return (all, average, or most extreme value across all channels)

      • Compute the latency for reaching a fractional area by integrating over the signal.

      • Compute the latency for reaching a fractional area across the absolute values of the signal.

      • Compute the latency for reaching a fractional area across the negative values of the signal.

      • Compute the latency for reaching a fractional area across the positive values of the signal.

  • get_area(...) operations:

    • Each of these three also have parameters to compute using/over integral, positive, negative, or absolute values of the signal

      • Compute the area for each channel then return all channels.

      • Compute the average area across all channels by first averaging the signal across all channels.

      • Compute the maximum area across all channels by finding all areas than returning the maximum.

    • Each of the next, also have parameters that describes what channels to return (all, average, or most extreme value across all channels)

      • Compute the area through normal integration.

      • Compute the area across the absolute values of the signal.

      • Compute the area across the negative values of the signal.

      • Compute the area across the positive values of the signal.

  • get_peak(...) operations:

    • Each of these three also have parameters to compute over positive, negative, or absolute values of the signal, return strictly positive or negative peaks.
      • Compute the peak for each channel then return all channels.

      • Compute the average peak across all channels by first averaging the signal across all channels.

      • Compute the peak for all channels, but only return the maximum absolute value (most extreme) peak.

    • Each of the next, also have parameters that describes what channels to return (all, average, or most extreme value across all channels), and whether to return strictly positive or negative peaks.
      • Compute the peak across the absolute values of the signal.

      • Compute the peak across the negative values of the signal.

      • Compute the peak across the positive values of the signal.

    • The following parameter will have options to specify what channels to return (all, average, or most extreme value across all channels), and to compute using positive, negative, or absolute values of the signal
      • Whether to return strictly positive or negative peaks (vs max/min behavior.)
  • get_fractional_peak_lat(...) operations:

    • Each of these three also have parameters to compute over positive, negative, or absolute values of the signal, and whether to return the onset and offset, whether to find strictly positive or negative peaks.

      • Compute the latency to reach X% of the peak for each channel, then return all channels.

      • Compute the average latency across all channels to reach a specified fractional peak by first averaging the signal across all channels.

      • Determine the earliest latency across all channels to reach a specified fractional peak by finding all latencies then returning the earliest.

    • Each of the next, also have parameters that describes what channels to return (all, average, or earliest value across all channels), and whether to return the onset and offset, whether to find strictly positive or negative peaks.

      • Compute the latency for reaching a fractional peak across the absolute values of the signal.

      • Compute the latency for reaching a fractional peak across the negative values of the signal.

      • Compute the latency for reaching a fractional peak across the positive values of the signal.

    • The following parameter will have options to specify what channels to return (all, average, or earliest value across all channels), and to compute using positive, negative, or absolute values of the signal, and whether to find strictly positive or negative peaks.

      • Compute the latency to a fractional peak and return both the onset and the 'offset' - the time after the peak when the signal drops to a specified fraction of the peak value.
    • The following parameter will have options to specify what channels to return (all, average, or earliest value across all channels), and to compute using positive, negative, or absolute values of the signal, and whether to return the onset and offset

      • Whether to return strictly positive or negative peaks (vs max/min behavior.)

withmywoessner avatar Mar 08 '24 23:03 withmywoessner

OK @withmywoessner sorry for the delay. Here is my take on what the API should be like:

  1. function names get_erp_peak, get_erp_area, get_erp_fractional_peak_latency, and get_erp_fractional_area_latency. I acknowledge that the latter two are very long, but I think there's an advantage in having get_erp_<TAB> bring up all four options in people's editors/iPython terminals.
  2. all four functions get tmin and tmax for bounding the computation in the time domain
  3. all four functions get a parameter mode for how to compute the peak or area. a. For peak functions the modes are pos, neg, abs (or maybe positive negative absolute? I'm waffling on that point; maybe best to ask for the full form of each word, but silently also accept the shorter forms too) - These determine whether to only consider extrema in one direction, other direction, or both directions - These affect both the method for finding the overall peak and also the method for finding the fractional value. b. For area functions the modes are pos, neg, abs, integrate. integrate effectively subtracts the negative areas from the positive ones. - as above for "peak", these affect the method for finding both overall area and fractional area.
  4. all four functions get a parameter average (boolean) indicating whether to average the signals together first, or operate channel-wise.
    • Also OK would be average: None | "mean" | callable if we think users will ever want to aggregate across channels in some other way besides arithmetic mean.
    • Probably fine to start with False | True and add the other options later only if users ask for it.
  5. Peak-related functions get a parameter strict ( boolean) indicating whether to error if a positive peak was requested but the whole signal is negative (or vice versa).
  6. I don't think we should add parameters / param values to return only the (biggest, earliest) area or peak across channels, because it's not obvious to me that e.g. "earliest" is always what users will want.
  7. In light of point (6), we should talk about what form the output takes. If we're going to force users to use np.min/np.argmin to find e.g. earliest latency (and the channel name where it occurs) we could consider returning something a bit more structured like a Pandas series, named tuple, dict, or similar.

WDYT about that proposal?

drammock avatar Mar 12 '24 21:03 drammock

Thanks!

  1. function names get_erp_peak, get_erp_area, get_erp_fractional_peak_latency, and get_erp_fractional_area_latency. I acknowledge that the latter two are very long, but I think there's an advantage in having get_erp_<TAB> bring up all four options in people's editors/iPython terminals.

Sounds good to me! Do you think we could shorten it a bit to get_erp_frac_area_lat?

  1. all four functions get a parameter mode for how to compute the peak or area. a. For peak functions the modes are pos, neg, abs (or maybe positive negative absolute? I'm waffling on that point; maybe best to ask for the full form of each word, but silently also accept the shorter forms too)

I think evoked.get_peak() accepts pos, neg, abs so it may be best to accept both to be consistent.

b. For area functions the modes are pos, neg, abs, integrate. integrate effectively subtracts the negative areas from the positive ones.

Could we also accept 'intg' in addition to integrate? I am personally a fan of shorter parameters and also the function names are long kinda long as well.

  • Probably fine to start with False | True and add the other options later only if users ask for it.

I like this more.

  1. In light of point (6), we should talk about what form the output takes. If we're going to force users to use np.min/np.argmin to find e.g. earliest latency (and the channel name where it occurs) we could consider returning something a bit more structured like a Pandas series, named tuple, dict, or similar.

Is a pandas dataframe all right? The only problem is it might be confusing if people want to run operations like np.min/np.argmin, but we could just include an example of how to find stats using pandas operations. We can also return all the variable types that way: get_erp_area

Area Channel
0 0.8 fz
1 0.9 cz

get_erp_peak

amplitude Channel latency
0 1.3 fz 0.6
1 1.5 cz 0.7

get_erp_fractional_peak_latency

fractional_peak_latency_onset Channel amplitude fractional_peak_latency_offset
0 1.3 fz 56 2.3
1 1.5 cz 78 2.5

get_erp_fractional_area_latency

fractional_area_latency Channel Area
0 1.3 fz 56
1 1.5 cz 78

Note the extra parameters: 'area' in get_erp_fractional_area_latency is the area of the region specifies by the fractional area percentile.

'fractional_peak_latency_onset' and 'fractional_peak_latency_offset' are the latencies before and after the peak at the specified fractional peak percentile in get_erp_fractional_peak_latency image

'amplitude' is the value of the amplitude at the fractional peak latency in get_erp_fractional_peak_latency

These may be unnecessary but maybe some people would like them just in case.

EDIT: The evoked.get_peak() function also includes a merge_grads parameter. I am not that familiar with MEG research. Is this something we should also include?

withmywoessner avatar Mar 12 '24 22:03 withmywoessner

Also, sorry for the confusion about explaining my proposed change. I marked them as outdated because I was planning on restating my implementation in clearer prose, so I am glad you were able to get the gist of it. Also, just curious, how long is it best to wait before pinging you (and other maintainers) usually. If you prefer to not be pinged at all that's fine as well!

withmywoessner avatar Mar 13 '24 00:03 withmywoessner

Is a pandas dataframe all right?

This may actually not be the best way to go because it is often the case that one wants to find multiple peaks of an erp component so maybe a simpler output is more appropriate.

withmywoessner avatar Mar 13 '24 01:03 withmywoessner

Do you think we could shorten it a bit to get_erp_frac_area_lat?

compromise on get_erp_frac_area_latency? shortening fractional to frac seems OK, most folks will at least get the gist of what is intended I think. But lat feels like it's harder to guess what is meant.

Could we also accept 'intg' in addition to integrate?

sure that seems reasonable. I hesitated because neg/pos/abs felt obvious, but how to shorten "integrate" felt less obvious. But seeing it in writing, intg seems OK, and if we have short forms for all of them, might as well just always require the short form / not bother handling the long form.

Is a pandas dataframe all right?

Pandas is not a hard dependency of MNE-Python, so we should try to come up with a simpler approach that is still reasonably user-friendly. If we do support pandas output, we probably need an output="pandas"|"dict" or similar.

That said, pandas output seems pretty natural here; I was imagining Series with the channel name as the index, which would be nice because things like .min() or .mean() would "just work" (they won't work naturally if one column is all strings). I was forgetting that some of these functions return 2 things (e.g. peak amplitude and peak latency) but I think the same logic applies there (dataframe with ch names as the index; e.g. dataframe.min() by default is column-wise, which is what folks probably want).

As for what to do for non-Pandas output, here are some options:

  • dict-of-named-tuples, where the dict keys are the output measurement names (latency, amplitude) and the namedtuple names are channel names
  • tuple-of-arrays, e.g., array of latencies, array of amplitudes, and (possibly) array of channel names. I forgot to include this in my proposal, but these functions all also take a picks argument right? In which case (depending on what way of specifying picks that we allow) it's not necessarily safe to assume that the user already has the channel names ready-to-hand (e.g., if they used picks="eeg" they might not).
  • something else? happy to hear others' input here.

fractional_peak_latency_offset

ah yes, I had seen this before but lost my comments on it in the copy-paste into the GH UI. This needs a bit of clarification I think, as to what is actually desired. Here are two possibilities that don't necessarily give the same answer:

  • side: left, right, both Whether to approach the peak from the negative or positive direction on the time axis. right gives "the time at which the amplitude drops below N percent of the peak value and never goes back above N percent.

  • which: onset, offset, both offset gives "the first time after the peak at which the signal drops below N percent of the peak value".

drammock avatar Mar 13 '24 20:03 drammock

compromise on get_erp_frac_area_latency?

Sounds good @drammock!

we probably need an output="pandas"|"dict" or similar.

Sounds good as well!

As for what to do for non-Pandas output, here are some options:

  • dict-of-named-tuples, where the dict keys are the output measurement names (latency, amplitude) and the namedtuple names are channel names
  • tuple-of-arrays, e.g., array of latencies, array of amplitudes, and (possibly) array of channel names. I forgot to include this in my proposal, but these functions all also take a picks argument right? In which case (depending on what way of specifying picks that we allow) it's not necessarily safe to assume that the user already has the channel names ready-to-hand (e.g., if they used picks="eeg" they might not).

Yeah, so maybe dict-of-named-tuples is the best decision for now.

fractional_peak_latency_offset

ah yes, I had seen this before but lost my comments on it in the copy-paste into the GH UI. This needs a bit of clarification I think, as to what is actually desired. Here are two possibilities that don't necessarily give the same answer:

  • side: left, right, both Whether to approach the peak from the negative or positive direction on the time axis. right gives "the time at which the amplitude drops below N percent of the peak value and never goes back above N percent.
  • which: onset, offset, both offset gives "the first time after the peak at which the signal drops below N percent of the peak value".

Just to clarify is this a case where they would give a different result? image

  • which: onset, offset, both offset gives "the first time after the peak at which the signal drops below N percent of the peak value".

From ERPLab documentation:

We also include an option to find the 'offset' - the first time after the peak that the ERP drops to 50% as large as the peak

I think we should at least support this to make it concordant with the ERPLab implementation which a lot of people may be used to. How about we include a parameter that allows one to specify the number of times the peak is allowed to cross the specified amplitude fraction and we could include an option 'last' (or maybe even -1) to find the last time it crosses the threshold without crossing again. That way we can support both use cases: image

withmywoessner avatar Mar 13 '24 20:03 withmywoessner