pyperf Reconsidering min()?

I've done a little experiment: I run timeit.repeat 1000 times on a really simple statement. After that, I sorted the list and I grouped the results in time slices, counting the number of times a bench is inside a certain time slice. This way I have a sort of probability. Then I plot the result:

https://www.dropbox.com/s/f4naryc055k42cs/2020-07-24-080340_1366x768_scrot.png?dl=0

The result is quite interesting. As you can see, the distribution is not a normal distribution. It seems a multimodal normal distribution. Furthermore, the highest probability is near the minimum value, and not near the average (that is on the second peak).

Mean and stdev are only meaningful if the distribution is a normal distribution. It seems that the way you can have a sort of normal distribution is to consider only the first part of the plot.

Since pyperf returns mean and stdev, does it "cut" the slowest runs?

Jul 30 '20 17:07 Marco-Sulla

Hi,

I've done a little experiment: I run timeit.repeat 1000 times on a really simple statement

Are you talking about the timeit module of the Python standard library? You should use the pyperf timeit command instead. https://pyperf.readthedocs.io/en/latest/cli.html#timeit-cmd

As you can see, the distribution is not a normal distribution. It seems a multimodal normal distribution.

It seems like your benchmark has a flaw. Maybe you need to increase the number of warmup runs.

Read also https://pyperf.readthedocs.io/en/latest/system.html

Aug 03 '20 11:08 vstinner

I was a little confusing in my explaining. I post here my "steps to reproduce".

This is the tests done with pyperf without system tune (or with system reset):

(venv_3_10) marco@buzz:~$ python -m pyperf timeit --rigorous "key not in o" -s """
import uuid
o = {str(uuid.uuid4()) : str(uuid.uuid4()) for x in range(5)}
key = str(uuid.uuid4())
"""
.........................................
WARNING: the benchmark result may be unstable
* the standard deviation (3.53 ns) is 15% of the mean (23.5 ns)
* the maximum (50.2 ns) is 114% greater than the mean (23.5 ns)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

Mean +- std dev: 23.5 ns +- 3.5 ns

This is the tests done with pyperf with system tune:

(venv_3_10) marco@buzz:~$ sudo python -m pyperf system tune
[sudo] password for marco: 
Tune the system configuration to run benchmarks

Actions
=======

Perf event: Max sample rate set to 1 per second
CPU Frequency: Minimum frequency of CPU 0-3 set to the maximum frequency
CPU scaling governor (intel_pstate): CPU scaling governor set to performance
Turbo Boost (intel_pstate): Turbo Boost disabled: '1' written into /sys/devices/system/cpu/intel_pstate/no_turbo
IRQ affinity: Stop irqbalance service
IRQ affinity: Set affinity of IRQ 122-124,126,129-131 to CPU 0-3

System state
============

CPU: use 4 logical CPUs: 0-3
Perf event: Maximum sample rate: 1 per second
ASLR: Full randomization
Linux scheduler: No CPU is isolated
CPU Frequency: 0-3=min=max=2600 MHz
CPU scaling governor (intel_pstate): performance
Turbo Boost (intel_pstate): Turbo Boost disabled
IRQ affinity: irqbalance service: inactive
IRQ affinity: Default IRQ affinity: CPU 0-3
IRQ affinity: IRQ affinity: IRQ 0-17,51,67,120-131=CPU 0-3
Power supply: the power cable is plugged

Advices
=======

Linux scheduler: Use isolcpus=<cpu list> kernel parameter to isolate CPUs
Linux scheduler: Use rcu_nocbs=<cpu list> kernel parameter (with isolcpus) to not schedule RCU on isolated CPUs
(venv_3_10) marco@buzz:~$ python -m pyperf timeit --rigorous "key not in o" -s """
import uuid
o = {str(uuid.uuid4()) : str(uuid.uuid4()) for x in range(5)}
key = str(uuid.uuid4())
"""
.........................................
WARNING: the benchmark result may be unstable
* the standard deviation (3.38 ns) is 11% of the mean (31.2 ns)
* the maximum (59.2 ns) is 90% greater than the mean (31.2 ns)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

Mean +- std dev: 31.2 ns +- 3.4 ns
(venv_3_10) marco@buzz:~$ python -m pyperf timeit --rigorous "key not in o" -s """
import uuid
o = {str(uuid.uuid4()) : str(uuid.uuid4()) for x in range(5)}
key = str(uuid.uuid4())
"""
.........................................
Mean +- std dev: 31.5 ns +- 2.7 ns
(venv_3_10) marco@buzz:~$ python -m pyperf timeit --rigorous "key not in o" -s """
import uuid
o = {str(uuid.uuid4()) : str(uuid.uuid4()) for x in range(5)}
key = str(uuid.uuid4())
"""
.........................................
WARNING: the benchmark result may be unstable
* the standard deviation (3.76 ns) is 12% of the mean (31.8 ns)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

Mean +- std dev: 31.8 ns +- 3.8 ns

Notice that I've not isolated any CPUs, since I have only 2 cores x 2 threads.

This is the test I've done for timeit. I tested the code using both system tune and system reset before:

from matplotlib import pyplot as plotter
import timeit
import uuid
import statistics

# sort of "warmup"
times = timeit.repeat(
    stmt="key not in o", 
    setup="o={str(uuid.uuid4()):str(uuid.uuid4()) for x in range(5)}; key = str(uuid.uuid4())", 
    globals={"uuid": uuid}, repeat=1000
)

times = timeit.repeat(
    stmt="key not in o", 
    setup="o={str(uuid.uuid4()):str(uuid.uuid4()) for x in range(5)}; key = str(uuid.uuid4())", 
    globals={"uuid": uuid}, repeat=1000
)

def getStatistic(data, step=None):
    x = []
    y = []
    sorted_data = sorted(data)
    
    # ad hoc value
    if step == None:
        step = (sorted_data[1] - sorted_data[0]) * 2
    
    val = sorted_data[0]
    prob = 0
    
    for datum in data:
        if datum < val + step:
            prob += 1
        else:
            x.append(val)
            y.append(prob)
            prob = 1
            val = datum
    
    return (x, y)


(x, y) = getStatistic(times)

xbar = statistics.mean(times)
plotter.figure()
plotter.plot(x, y)
plotter.vlines(xbar, min(y), max(y), colors="r", label="mean")
plotter.grid(True)
plotter.show()

If I run this code with system reset, I get the multimodal statistic. If I run it with system tune, I get no peak. The resulting distribution is always increasing.

So I focused on the results I got from the test with system reset. I subsequently done:

sorted_times = sorted(times)
slice = sorted_times[0:600]
(x, y) = getStatistic(slice, step=(sorted_times[1] - sorted_times[0]) * 2)

xbar = statistics.mean(x)
plotter.figure()
plotter.plot(x, y)
plotter.vlines(xbar, min(y), max(y), colors="r", label="mean")
plotter.grid(True)
plotter.show()

that is, I dropped manually the slowest results (or, in other words, I considered only the first gaussian curve). The mean in this case is compatible with the resulting distribution. The results are:

>>> xbar
0.022125485974538606
>>> statistics.stdev(slice)
0.000367162257224316

that is compatible with pyperf run after system reset.

It seems that both you and timeit devs are right: it makes sense to use the normal distribution, but only around the minimum.

Aug 04 '20 17:08 Marco-Sulla

WARNING: the benchmark result may be unstable

the standard deviation (3.53 ns) is 15% of the mean (23.5 ns)

the maximum (50.2 ns) is 114% greater than the mean (23.5 ns)

It seems like you have a lot of outlines. You may analyze results with "pyperf stats" and "pyperf hist". You may try to increase --warmups.

Aug 05 '20 14:08 vstinner

Ok, I'll explain it better (and with better code. Ignore the first one, is bugged :) )

TL;DR I wrote a little script that seems to do the same results of your timeit. The only difference is that the error is nearer to the "reality", while pyperf overestimate it.

Long explaining:

I tried to raise the number of loops and teardowns to very high values. The result is the same:

(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ python -m pyperf timeit --rigorous "key not in o" --setup "from uuid import uuid4 ; o = {str(uuid4()).replace('-', '') : str(uuid4()).replace('-', '') for i in range(5)} ; key = str(uuid4()).replace('-', '')" -l 10000000 -w 20
........................................
WARNING: the benchmark result may be unstable
* the standard deviation (3.84 ns) is 12% of the mean (31.9 ns)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

Mean +- std dev: 31.9 ns +- 3.8 ns

Then I tried to remove the "noise" in my test code. This is the new code:

#!/usr/bin/env python3

def main(loops, seconds, ratio):
    from matplotlib import pyplot as plotter
    import timeit
    import uuid
    import statistics
    from time import perf_counter


    def getStatistic(data, num=None, npiece=None):
        x = []
        y = []
        sorted_data = sorted(data)
        
        val = sorted_data[0]
        prob = 1
        
        if npiece == None:
            # TODO does not work well with peaks
            npiece = int(len(data) / 2)
        
        if num == None:
            num = len(data)
        
        # TODO step could be variable, for peaks
        step = (sorted_data[-1] - sorted_data[0]) / npiece
        
        for datum in sorted_data:
            if datum < val + step:
                prob += 1
            else:
                x.append(val)
                y.append(prob / num)
                prob = 1
                val = datum
        
        return (x, y, x[y.index(max(y))])


    # TODO check if there's an alternative to normal distribution
    def ttest(data, t):
        (x, y, xbar) = getStatistic(data)
        
        while True:
            sigma = statistics.stdev(data, xbar=xbar)
            
            new_data = []
            
            for x in data:
                if abs(x - xbar) < t * sigma:
                    new_data.append(x)
            
            old_data = data
            data = new_data
            
            if len(data) == len(old_data):
                data.sort()
                return data
            
            try:
                (x, y, null) = getStatistic(data)
            except ZeroDivisionError:
                old_data.sort()
                return old_data


    def plot(x_axes, y_axes, xbar):
        for x, y in zip(x_axes, y_axes):
            plotter.figure()
            plotter.grid(True)
            plotter.plot(x, y)
            plotter.vlines(xbar, min(y), max(y), colors="r", label="xbar")
        
        try:
            plotter.show()
        except KeyboardInterrupt:
            plotter.close("all")
            print()


    def getMagnitudeOrder(num):
        num_str = "{:.1e}".format(num)
        return int(num_str.split("e")[1])


    def getTimes(stmt, setup, seconds, ratio):
        t = timeit.Timer(stmt=stmt, setup=setup)
        number = int(t.autorange()[0] / ratio)
        
        times = []
        
        start_time = perf_counter()
        
        while True:
            times.append(t.timeit(number=number))
            
            if perf_counter() - start_time > seconds:
                break
        
        return [t / number for t in times]


    def bench(stmt, setup, seconds, ratio, t=2):
        times = getTimes(stmt=stmt, setup=setup, seconds=seconds, ratio=ratio)
        
        peak = ttest(times, t=t)
        
        (x, y, xbar) = getStatistic(times)
        (x2, y2, null) = getStatistic(peak, num=len(times))
        
        sigma = statistics.stdev(peak, xbar=xbar)
        
        return {
            "xbar": xbar, 
            "sigma": sigma, 
            "x_times": x,
            "y_times": y,
            "x_peak": x2,
            "y_peak": y2,
        }
    
    stmt = "key not in o"
    
    
    setup = """
from uuid import uuid4

def getUuid():
    return str(uuid4()).replace("-", "")

o = {getUuid() : getUuid() for i in range(5)}
key = getUuid()
"""
    
    xbars = []
    sigmas = []
    
    for i in range(loops):
        res = bench(
            stmt=stmt, 
            setup=setup, 
            seconds=seconds, 
            ratio=ratio
        )
        
        xbar = res["xbar"]
        sigma = res["sigma"]
        
        xbars.append(xbar)
        sigmas.append(sigma)
    
    sigma = statistics.median_low(sigmas)
    i = sigmas.index(sigma)
    xbar = xbars[i]
    
    x = res["x_times"]
    y = res["y_times"]
    x2 = res["x_peak"]
    y2 = res["y_peak"]
    
    xbar_magn = getMagnitudeOrder(xbar)
    sigma_magn = getMagnitudeOrder(sigma)
    decs = xbar_magn - sigma_magn
    
    res_tpl = "Result = {xbar:." + str(decs) + "e} +- {sigma:.0e}"
    
    print(res_tpl.format(xbar=xbar, sigma=sigma))
    
    plot((x, x2), (y, y2), xbar)

main(loops=10, seconds=3, ratio=10)

Basically, the script do this:

create a timeit.Timer object t
take the number of loops from t.autorange(), divided by 10 (arbitrary number... it seems to work)
run t.timeit the number of times and sum the results, until delta seconds are passed (I set it arbitrarily to 3); this way I get a list of times
after that, I sort the times and I divide them with a step. By default is the difference between the max time and the min time, divided by the half length of times (arbitrary too). This way I have an x. Tthen I sample the times counting the number of times they are inside a step, then I divide this numbers by the length of times, to get a sort of probability. This way I have an y. Furthermore, I set as the "true" value of x, xbar, the x that corresponds to the peak of the x,y diagram (the most probable time).
after that, I discard the most unprobable times, using a simple t-test. Then I repeat point 4 and 5 until no data is discarded. Notice that the stdev used for the t-test uses the xbar taken before.
Then, I get the last xbar and stdev. I do this 10 times and I get the median result.

The results (without the plot):

(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ python -VV
Python 3.6.9 (default, Jul 17 2020, 12:50:27) 
[GCC 8.4.0](venv_3_6) marco@buzz:~/sources/cpython-frozendict$ ./bench2.py 
Result = 2.086e-08 +- 2e-11
[... repeat ...]
Result = 2.096e-08 +- 3e-11
Result = 2.082e-08 +- 4e-11
Result = 2.079e-08 +- 3e-11
Result = 2.084e-08 +- 3e-11
Result = 2.097e-08 +- 4e-11
Result = 2.087e-08 +- 3e-11
Result = 2.091e-08 +- 1e-11
Result = 2.099e-08 +- 2e-11
Result = 2.080e-08 +- 3e-11
(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ python -m pyperf system show
Show the system configuration

System state
============

CPU: use 4 logical CPUs: 0-3
Perf event: Maximum sample rate: 25000 per second
ASLR: Full randomization
Linux scheduler: No CPU is isolated
CPU Frequency: 0-3=min=400 MHz, max=3500 MHz
CPU scaling governor (intel_pstate): powersave
Turbo Boost (intel_pstate): Turbo Boost enabled
IRQ affinity: irqbalance service: active
IRQ affinity: Default IRQ affinity: CPU 0-3
IRQ affinity: IRQ affinity: IRQ 0-17,51,67,120-121,125,127,129=CPU 0-3; IRQ 122=CPU 1,3; IRQ 123-124=CPU 0,2; IRQ 126=CPU 0; IRQ 128=CPU 1; IRQ 130=CPU 2; IRQ 131=CPU 3
Power supply: the power cable is plugged

Advices
=======

Perf event: Set max sample rate to 1
Linux scheduler: Use isolcpus=<cpu list> kernel parameter to isolate CPUs
Linux scheduler: Use rcu_nocbs=<cpu list> kernel parameter (with isolcpus) to not schedule RCU on isolated CPUs
CPU scaling governor (intel_pstate): Use CPU scaling governor 'performance'
Turbo Boost (intel_pstate): Disable Turbo Boost to get more reliable CPU frequency

Run "python -m pyperf system tune" to tune the system configuration to run benchmarks
(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ python -m pyperf timeit --rigorous "key not in o" --setup "from uuid import uuid4 ; o = {str(uuid4()).replace('-', '') : str(uuid4()).replace('-', '') for i in range(5)} ; key = str(uuid4()).replace('-', '')"
.........................................
WARNING: the benchmark result may be unstable
* the standard deviation (2.71 ns) is 12% of the mean (23.3 ns)
* the maximum (36.2 ns) is 55% greater than the mean (23.3 ns)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

Mean +- std dev: 23.3 ns +- 2.7 ns
[...repeat...]
Mean +- std dev: 23.0 ns +- 2.8 ns
Mean +- std dev: 23.0 ns +- 2.2 ns
Mean +- std dev: 23.3 ns +- 2.8 ns
Mean +- std dev: 23.1 ns +- 2.6 ns
Mean +- std dev: 23.3 ns +- 2.9 ns
Mean +- std dev: 23.5 ns +- 2.6 ns
Mean +- std dev: 23.4 ns +- 2.8 ns
Mean +- std dev: 23.2 ns +- 2.2 ns
Mean +- std dev: 23.5 ns +- 2.4 ns
(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ sudo python -m pyperf system tune
[sudo] password for marco: 
Tune the system configuration to run benchmarks

Actions
=======

Perf event: Max sample rate set to 1 per second
CPU Frequency: Minimum frequency of CPU 0-3 set to the maximum frequency
CPU scaling governor (intel_pstate): CPU scaling governor set to performance
Turbo Boost (intel_pstate): Turbo Boost disabled: '1' written into /sys/devices/system/cpu/intel_pstate/no_turbo
IRQ affinity: Stop irqbalance service
IRQ affinity: Set affinity of IRQ 122-124,126,128,130-131 to CPU 0-3

System state
============

CPU: use 4 logical CPUs: 0-3
Perf event: Maximum sample rate: 1 per second
ASLR: Full randomization
Linux scheduler: No CPU is isolated
CPU Frequency: 0-3=min=max=2600 MHz
CPU scaling governor (intel_pstate): performance
Turbo Boost (intel_pstate): Turbo Boost disabled
IRQ affinity: irqbalance service: inactive
IRQ affinity: Default IRQ affinity: CPU 0-3
IRQ affinity: IRQ affinity: IRQ 0-17,51,67,120-131=CPU 0-3
Power supply: the power cable is plugged

Advices
=======

Linux scheduler: Use isolcpus=<cpu list> kernel parameter to isolate CPUs
Linux scheduler: Use rcu_nocbs=<cpu list> kernel parameter (with isolcpus) to not schedule RCU on isolated CPUs(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ ./bench2.py 
Result = 2.808e-08 +- 4e-11
Result = 2.816e-08 +- 2e-11
Result = 2.810e-08 +- 3e-11 
Result = 2.812e-08 +- 4e-11
Result = 2.809e-08 +- 2e-11
Result = 2.814e-08 +- 2e-11
Result = 2.817e-08 +- 3e-11
Result = 2.810e-08 +- 3e-11 
Result = 2.821e-08 +- 2e-11
Result = 2.818e-08 +- 3e-11
(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ python -m pyperf timeit --rigorous "key not in o" --setup "from uuid import uuid4 ; o = {str(uuid4()).replace('-', '') : str(uuid4()).replace('-', '') for i in range(5)} ; key = str(uuid4()).replace('-', '')"
.........................................
WARNING: the benchmark result may be unstable
* the standard deviation (3.35 ns) is 11% of the mean (31.1 ns)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.

Mean +- std dev: 31.1 ns +- 3.4 ns
Mean +- std dev: 30.4 ns +- 2.9 ns
Mean +- std dev: 30.9 ns +- 3.1 ns
Mean +- std dev: 31.2 ns +- 3.9 ns
Mean +- std dev: 31.2 ns +- 2.8 ns
Mean +- std dev: 30.8 ns +- 2.8 ns
Mean +- std dev: 31.5 ns +- 3.6 ns
Mean +- std dev: 31.6 ns +- 3.4 ns
Mean +- std dev: 30.8 ns +- 2.7 ns
Mean +- std dev: 30.8 ns +- 3.2 ns

Some considerations:

the error of my script seems to be more "realistic"
the bench times of my script tends to be lower

So, as I said in my previous post, it seems that both you and timeit devs are right: you have to do some statistical calcula to remove the "noise", but the "real" value is around the minimum.

Sep 06 '20 10:09 Marco-Sulla

The pyperf API gives you access to all values. You can use min(), max(), median(), etc. You're free to ignore outliers.

But I don't want to change the default: mean +- std dev is likely the best compromise for most common use cases.

To measure a microbenchmark faster than 100 ns, you should follow pyperf advices:

Advices
=======

Linux scheduler: Use isolcpus=<cpu list> kernel parameter to isolate CPUs
Linux scheduler: Use rcu_nocbs=<cpu list> kernel parameter (with isolcpus) to not schedule RCU on isolated CPUs

Sep 07 '20 14:09 vstinner

Ok, I can try these kernel parameters, but I did not suggested to remove mean and stdev, but to remove first "unlikely" results. I don't know how do you measure these quantities, but if you don't remove the "tail" of the measures, it's just wrong, since mean and stdev applies on a Gaussian distribution and, if you see the graph, this is not a Gaussian distribution.

Sep 07 '20 18:09 Marco-Sulla

@Marco-Sulla the sporadic results might just be due to string hash randomization: the lowest 3 bits of random strings sometimes collide, so the dict lookup has to take an extra step (or several) to probe another slot. As a proof of this concept, running pyperf timeit "list(range(hash('python') % 10_000))" gave me the very unstable Mean +- std dev: 56.3 us +- 39.6 us. I would be curious if your results are replicated with a more deterministic benchmark, say, using integers.

mean and stdev applies on a Gaussian distribution

Regardless of the distribution, multiplying the mean by the size of the data is a good estimator for the sum. When benchmarking code that runs many times, the sum is generally what I think people are looking to estimate. Question: "If I run this a million times, how long will that take in total?" Regardless of the type of distribution, a very good answer is: "One million times the mean."

I don't think outliers should be ignored without good reason, and I think your outliers are legitimate data points in this case, since hash randomization is a real part of the environment in which the code will be run. One could probably find a way to force a particular string hashing seed and run the benchmarks more deterministically on multiple different seeds to get consistent results for each.

And standard deviation is a good heuristic measure of spread, which can help detect when something is behaving unexpectedly or inconsistently, even if you can't produce a precise confidence interval from it without knowing the type of distribution. Two numbers can't possibly convey everything, but IMO mean and standard deviation are a good default.

Jul 16 '21 08:07 sweeneyde

Regardless of the distribution, multiplying the mean by the size of the data is a good estimator for the sum.

Yes, but I calculate the mean and the standard dev, but removing the "tail". The data shows that the times are a multiple Gaussian, where the most high Gaussian is the first one, with the more fast benchs. So I remove the other gaussian. How? With a simple t-test. If the data is X sigma greater of the fastest bench, I remove it, where the sigma is calculated with the real value as the minimum, and not the mean. It's imprecise, since the minimum is the first tail of the first Gaussian, but it's a good approximation. This is my algorithm:


    def mindev(data, xbar = None):
        """
        This function calculates the stdev around the _minimum_ data, and not the
        mean
        """

        if not data:
            raise ValueError("No data")
        
        if xbar == None:
            xbar = min(data)
        
        sigma2 = 0
        
        for x in data:
            sigma2 += (x - xbar) ** 2
        
        N = len(data) - 1
        
        if N < 1:
            N = 1
        
        return sqrt(sigma2 / N)
    
    def autorange(stmt, setup="pass", globals=None, ratio=1000, bench_time=10, number=None):
        if setup == None:
            setup = "pass"
        
        t = timeit.Timer(stmt=stmt, setup=setup, globals=globals)
        break_immediately = False
        
        if number == None:
            # get automatically the number of needed loops
            a = t.autorange()
            
            num = a[0]
            # the number of loops
            number = int(num / ratio)
            
            if number < 1:
                number = 1
            
            # the number of repeat of loops
            repeat = int(num / number)
            
            if repeat < 1:
                repeat = 1
        
        else:
            repeat = 1
            break_immediately = True
        
        results = []
        
        data_tmp = t.repeat(number=number, repeat=repeat)
        min_value = min(data_tmp)
        # I create a list with minimum values
        data_min = [min_value]
        
        bench_start = time()
        
        while 1:
            # I get additional benchs until `bench_time` seconds passes
            data_min.extend(t.repeat(number=number, repeat=repeat))
            
            if break_immediately or time() - bench_start > bench_time:
                break
        
        # I sort the data...
        data_min.sort()
        # ... so the minimum is the fist datum
        xbar = data_min[0]
        i = 0
        
        # I repeat until no data is removed
        while i < len(data_min):
            i = len(data_min)
            # I calculate the sigma using the minimum as "real" value, and not
            # the mean
            sigma = mindev(data_min, xbar=xbar)
            
            for i in range(2, len(data_min)):
                # I thind the point where the data are are greater than
                # 3 sigma. Data are sorted...
                if data_min[i] - xbar > 3 * sigma:
                    break
            
            k = i
            
            # do not remove too much data
            if i < 5:
                k = 5
            
            # remove the data with sigma > 3
            del data_min[k:]
        
        # I return the minimum as real value, with the sigma calculated around
        # the minimum
        return (min(data_min) / number, mindev(data_min, xbar=xbar) / number)

Maybe the last step, the return, is incorrect. I could return the normal mean and stdev, since data_min should contain only the first Gaussian.

Jul 16 '21 22:07 Marco-Sulla