MIES icon indicating copy to clipboard operation
MIES copied to clipboard

Add new mode for SF average operation

Open MichaelHuth opened this issue 1 month ago • 17 comments

close #2496

MichaelHuth avatar Nov 11 '25 17:11 MichaelHuth

@timjarsky Here is a first version to play with:

The new mode is called "sweepgroup".

For this mode avg requires as first argument an array of at least two selections.

group1 = select(selsweeps(0),selvis(all))
group2 = select(selsweeps(1),selvis(all))
group3 = select(selsweeps(0,1),selvis(all))

avg([$group1, $group2, $group3],sweepgroup)

What it does is the following:

  • the sweep data for each selection is retrieved
  • the sweep data of the nth entry of each selection is averaged

Selection to sweep data:

Sweep Sweep Sweep Sweep
Group 1 0 DA 0 AD - -
Group 2 1 DA 1 AD - -
Group 3 0 DA 0 AD 1 DA 1 AD

averaging now is done along the rows:

  • trace 1: avg(0 DA, 1 DA, 0 DA)
  • trace 2: avg(0 AD, 1 AD, 0 AD)
  • trace 3: avg(-, -, 1 DA)
  • trace 4: avg(-, -, 1 AD)

The non existing sweeps in a row are ignored.

The columns in the upper table are the sweep data that was retrieved for a given selection. There are selection conditions that can result in no sweep data, e.g. when an epoch was chosen that does not exist in a specific sweep. So when doing that average one has to take care that the column to row relation in the table is actually correct.

MichaelHuth avatar Nov 12 '25 16:11 MichaelHuth

@t-b @MichaelHuth does this need to be limited to select operations as input? For example, I'd like to pass in apfrequency data (see sample SF data below).

selbase = select(selrange(E1),selchannels(AD0), selvis(all),selsweeps(), selstimset("*rheo*", "*supra*"))

selexpA = select(selexp("*.10A.*"), $selbase)
selexpB = select(selexp("*.10B.*"), $selbase)

datA = data($selexpA)
datB = data($selexpB)

apfrequency($datA)

and

apfrequency($datB)

and

avg(apfrequency($datA), apfrequency($datB), sweepgroup)

timjarsky avatar Nov 14 '25 22:11 timjarsky

@timjarsky I made a change such that any dataset type elements for the first argument are accepted. For your sketched case the last line must be changed to avg([apfrequency($datA), apfrequency($datB)], sweepgroup) (note the enclosure in array brackets of the first argument of avg)

Now generally any input in the format avg([<dataset>,<dataset>, ... ], sweepgroup) is accepted.

If the dataset is of type selection then automatically data is applied, such that e.g. avg([select(),select()], sweepgroup) -> avg([data(select()),data(select())], sweepgroup). This implicit conversion also applies for over and in mode.

It might also make sense to rename the new mode from sweepgroup to bygroup reflecting the more generic argument handling. (I am open for suggestions)

The logic as sketched with the selections and the table above also applies for the generic dataset input for matching what gets averaged.

MichaelHuth avatar Nov 15 '25 03:11 MichaelHuth

Hi @MichaelHuth ,

I tested avg([apfrequency($datA), apfrequency($datB)], sweepgroup) and it removes the x-data, that is, each average point is plotted against a single x value, even if vs. is provided.

The end goal is to input multiple FI curves from individual experiments, and generate an average FI response (multiple average frequency vs average current data points)

timjarsky avatar Nov 17 '25 18:11 timjarsky

@t-b @MichaelHuth avgMethodTesting.pxp.zip

image

timjarsky avatar Nov 17 '25 23:11 timjarsky

@timjarsky I made a change such that meta data like the sweep numbers is transferred from the first group in the array to the average result:

image

The distribution on the x-axis results from sweep number meta data present in the result wave. In your data the two incoming groups have different sweep numbers, so on a numerical basis it does not make sense to join the sweep numbers somehow together.

The only other option would be to create some text wave where the sweep numbers of the incoming groups are joined like e.g. 69_19 and let the plotter do a category plot. With more input groups the category labels would get rather long.

MichaelHuth avatar Nov 18 '25 02:11 MichaelHuth

@MichaelHuth this question is tangental to this PR. I expect the following three code snippets to return the same thing, but one doesn't. Am I misunderstanding what the "intersection" of select operations means?

selrange("E1") is contained in a single select statement. plots the E1 epoch

fullSelect = select(selexp("*.10A.*"), $selbase, selchannels(AD0), selvis(all),selsweeps(), selstimset("*rheo*", "*supra*"), selrange("E1"))

data($fullSelect)

selrange("E1") is contained in the "base" select statement. Plots the entire stimulus set

selbase = select(selvis(all),selsweeps(), selstimset("*rheo*", "*supra*"), selrange("E1"))

selexpA = select(selexp("*.10A.*"), $selbase, selchannels(AD0))

data($selexpA)

selrange("E1") is in the combo select statement, plots the E1 epoch avgMethodTesting.pxp.zip

selbaseTWO = select(selvis(all),selsweeps(), selstimset("*rheo*", "*supra*"), )

selexpTWO = select(selexp("*.10A.*"), $selbaseTWO, selchannels(AD0), selrange("E1"))

data($selexpTWO)

do selectrange() and selectchannels() need to be in the same select statment for selectrange() to work?

timjarsky avatar Nov 18 '25 04:11 timjarsky

@MichaelHuth this question is tangental to this PR. I expect the following three code snippets to return the same thing, but one doesn't. Am I misunderstanding what the "intersection" of select operations means?

The range does not intersect. This is also in the select documentation: The range specified through selrange is always taken from the topmost select.

The background here is that selrange can contain either epoch specifications with optional wildcards or time intervals. The actual resolve of the epoch to time intervals is done when data is applied (i.e. when the sweep data is retrieved). But let's say we would already do that part of data in the select statement and resolve the time intervals. Then the next issue comes up and this is the question how to exactly do the range intersection?

If there is an inner selrange("E*") that would resolve to time intervals [100, 200] and [300, 400] and then there is an outer selrange to intersect with e.g. [150, 350] and [380, 500] then I can intersect every range with every other:

[100, 200] with [150, 350] -> [150, 200]
[100, 200] with [380, 500] -> none
[300, 400] with [150, 350] -> [300, 350]
[300, 400] with [380, 500] -> [300, 380]

Is this really useful for sweep data?

Another approach would be to do intersection through epoch names by the names, if both sides of the intersection contain the same name. e.g. "E*" would possibly resolve to "E1", "E2", "E3". Intersected with "E2" results in "E2".

But what about "E2" with "ST"?

So in summary, intersecting ranges with selects currently not done. It is in principle possible, if there is a clear definition how this intersection should operate as it is not straight forward. It would require additional changes in SF for select because the actual ranges would need to be determined earlier (now: in data, then additionally: in each select). It would make select slower.

MichaelHuth avatar Nov 18 '25 13:11 MichaelHuth

Hi @MichaelHuth , thank you for the clear explanation.

I define the intersection as the set of regions that overlap. So I would do the following:

[100, 200] with [150, 350] -> [150, 200] [100, 200] with [380, 500] -> none [300, 400] with [150, 350] -> [300, 350] [300, 400] with [380, 500] -> [380, 400]

However, I can work with it as it is and would want to talk more before implementing anything.

Thanks again.

timjarsky avatar Nov 19 '25 03:11 timjarsky

Hi @MichaelHuth, the latest changes may have broken averaging; I expect the bottom plot to contain the average of the to FI arrays in the top plot.

avgMethodTesting 2.pxp.zip

sel = select(selsweeps(), selstimset("*rheo*", "*supra*"), selvis(all))

selexpA = select(selexp("*.10A.*"), $sel, selchannels(AD0), selrange(E1))

selexpB = select(selexp("*.10B.*"), $sel, selchannels(AD0), selrange(E1))

selDAA = select(selexp("*.10A.*"), $sel, selchannels(DA0), selrange(E1))

selDAB = select(selexp("*.10B.*"), $sel, selchannels(DA0), selrange(E1))

freqA = apfrequency(data($selexpA))

currentA = max(data($selDAA))

freqB = apfrequency(data($selexpB))

currentB = max(data($selDAB))


$freqA

vs

$currentA

with

$freqB

vs 

$currentB

and

avg([$freqA, $freqB], sweepgroup)

vs

avg([$currentA, $currentB], sweepgroup)
image

timjarsky avatar Nov 19 '25 04:11 timjarsky

I looked into it and the issue is actually not the averaging. What happens is that for Min and Max the wave scale is copied to the result. Thus, the result value from Max has an x dimension offset that equals the time when epoch E1 starts. Now the E1 from selDAA and selDAB start at different times. Thus, the Max results have a different x-offset. The averaging is x-offset aware and does not return a single value but four in this case (due to average creating common x points). So the result is something like [557, NaN, NaN, 647].

Then for plotting our heuristics kicks in because the plotter sees then something like [4] vs [557, NaN, NaN, 647], where the number of points for an X vs Y plot do not match.

I am not sure what the original use case was that it makes sense to transfer the x-offset when doing max to the result. Our tests do not verify that behavior. I have removed the scale transfer and get: image

The result looks good imho. The colors in the upper plot do not match. I added small numbers to the first two pairs as indicators:

image

MichaelHuth avatar Nov 19 '25 05:11 MichaelHuth

@MichaelHuth okay, the latest change is helpful:

image

Now I want to subtract the first value in $currentA from all of $currentA. I imagined the syntax like this: $currentA - $currentA[0]. Is there a way to do that?

timjarsky avatar Nov 19 '25 19:11 timjarsky

At the moment there is not. We have an operation dataset to join data to a dataset, but we have no operation to extract a single dataset from an expression that returns multiple datasets. Indexing through brackets is not supported by SF.

Adding indexing support with brackets to SF would be a big change because the parser/executor would need changes for that. A way with less effort is to introduce a new operation like extract($data, <index>)

Such that you can write $currentA - extract($currentA, 0). The logic in minus extends the second term again temporarily to have the same number of datasets as $currentA (but with the value from the first).

I will add an operation for this, so you can progress.

MichaelHuth avatar Nov 20 '25 05:11 MichaelHuth

@timjarsky I added the extract operation that allows to extract a single dataset from a multi dataset expression. Your substraction expression looks like: $currentA - extract($currentA, 0) then.

I noticed that there is historically no explicit discrimination between arrays and datasets in SF. Thus, for some constructs the executor joins datasets to an array, especially when entered directly (not as result of an operation). This was not yet a big issue as we did not evaluated arrays of datasets. So some expressions like avg([dataset(1), dataset(2)],sweepgroup) give an error as currently the executor converts [dataset(1),dataset(2)] -> [1,2]. [1,2] is seen as a single dataset and then the sweepgroup mode is missing the second expected dataset.

MichaelHuth avatar Nov 21 '25 17:11 MichaelHuth

@MichaelHuth

Works!

sel = select(selsweeps(), selstimset("*rheo*", "*supra*"), selvis(all))

selexpA = select(selexp("*.10A.*"), $sel, selchannels(AD0), selrange(E1))

selexpB = select(selexp("*.10B.*"), $sel, selchannels(AD0), selrange(E1))

selDAA = select(selexp("*.10A.*"), $sel, selchannels(DA0), selrange(E1))

selDAB = select(selexp("*.10B.*"), $sel, selchannels(DA0), selrange(E1))

freqA = apfrequency(data($selexpA))

currentA = max(data($selDAA))
currentANorm = $currentA - extract($currentA, 0)

freqB = apfrequency(data($selexpB))

currentB = max(data($selDAB))
currentBNorm = $currentB - extract($currentB, 0)


$freqA

vs

$currentA - extract($currentA, 0)

with

$freqB 

vs 

$currentB - extract($currentB, 0)

with

avg([$freqA, $freqB], sweepgroup)

vs

avg([$currentANorm, $currentBNorm], sweepgroup)
image

Nitpick: the sweep numbering shown in the legend for the average trace reflects only one dataset, not both.

I can trigger a color selection assertion by wrapping the last two lines in a merge. I see you may have fixed this in a different PR, but I wanted to confirm.

..
merge(avg([$freqA, $freqB], sweepgroup))

vs

merge(avg([$currentANorm, $currentBNorm], sweepgroup))
################################
  !!! Assertion FAILED !!!
  Message: "Invalid color group"
  Please provide the following information if you contact the MIES developers:
  ################################
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Stacktrace:
  SF_button_sweepFormula_display(...)#L1667 [MIES_SweepFormula.ipf]
SF_FormulaPlotter(...)#L903 [MIES_SweepFormula.ipf]
SF_GetTraceColor(...)#L392 [MIES_SweepFormula.ipf]
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Time: 2025-11-22T08:38:57-08:00
  Locked device: [- none -]
  Current sweep: [- none -]
  DAQ: [- none -]
  Testpulse: [- none -]
  Acquisition state: [- none -]
  Experiment: avgMethodTesting (Packed)
  Igor Pro version: 9.06B01 (56565)
  MIES version:
  
  ################################

I plan to create a follow-up issue to enhance APfrequency so that generating these plots is less tedious when using pipeline data acquired with analysis functions.

timjarsky avatar Nov 22 '25 16:11 timjarsky

Nitpick: the sweep numbering shown in the legend for the average trace reflects only one dataset, not both.

The background for this is the following:

The sweepgroup average mode takes any datasets as groups as input. Each dataset has its meta data, like e.g. sweep number. Until now we did not have arguments in operations where we can input datasets with different meta data. The new average mode allows that: avg([$ds1, $ds2],sweepgroup) where $ds1 might be datasets with sweep number as meta data and $ds2 might be datasets that just contain some arbitrary wave data with different meta data.

Now we have some logic where we look at the meta data of a result in the plotter to determine how to display it on an x-axis. One thing evaluated is, if the data is a single data point and the meta data contains a sweep number. Then the plot with the known colored dots is created (as seen above).

Now in the data from the example experiment data from sweep 19 is averaged with data from sweep 69, 20 with 70 and so on. How to join these?

  1. Do not join, just use meta data of first group for avg result
  2. Generate arbitrary sweep numbers by just indexing the group, like 0, 1, 2, 3, 4, ...
  3. Combine sweep numbers by multiplying each subsequent groups sweep numbers by 100 -> 69019, 70020, 71021, ...
  4. Introduce virtual negative sweep numbers and reverse x-axis, like 0, -1, -2, -3, -4, ... (and tag result to contain virtual sweep numbers)
  5. Clear any meta data in the result -> all avg results in the example experiment get plotted at the same x
  6. .. probably more ways

And this sketches only possibilities if all groups have sweep numbers, what if one group has sweep numbers and the next has some different meta data property?

Currently I implemented point 1.

MichaelHuth avatar Nov 25 '25 15:11 MichaelHuth

I can trigger a color selection assertion by wrapping the last two lines in a merge. I see you may have fixed this in a different PR, but I wanted to confirm.

Yes I think this is fixed in main.

t-b avatar Nov 26 '25 20:11 t-b

I renamed the mode to just group.

MichaelHuth avatar Nov 28 '25 15:11 MichaelHuth