Reconsidering min()?
I've done a little experiment: I run timeit.repeat 1000 times on a really simple statement. After that, I sorted the list and I grouped the results in time slices, counting the number of times a bench is inside a certain time slice. This way I have a sort of probability. Then I plot the result:
https://www.dropbox.com/s/f4naryc055k42cs/2020-07-24-080340_1366x768_scrot.png?dl=0
The result is quite interesting. As you can see, the distribution is not a normal distribution. It seems a multimodal normal distribution. Furthermore, the highest probability is near the minimum value, and not near the average (that is on the second peak).
Mean and stdev are only meaningful if the distribution is a normal distribution. It seems that the way you can have a sort of normal distribution is to consider only the first part of the plot.
Since pyperf returns mean and stdev, does it "cut" the slowest runs?
Hi,
I've done a little experiment: I run timeit.repeat 1000 times on a really simple statement
Are you talking about the timeit module of the Python standard library? You should use the pyperf timeit command instead. https://pyperf.readthedocs.io/en/latest/cli.html#timeit-cmd
As you can see, the distribution is not a normal distribution. It seems a multimodal normal distribution.
It seems like your benchmark has a flaw. Maybe you need to increase the number of warmup runs.
Read also https://pyperf.readthedocs.io/en/latest/system.html
I was a little confusing in my explaining. I post here my "steps to reproduce".
This is the tests done with pyperf without system tune (or with system reset):
(venv_3_10) marco@buzz:~$ python -m pyperf timeit --rigorous "key not in o" -s """
import uuid
o = {str(uuid.uuid4()) : str(uuid.uuid4()) for x in range(5)}
key = str(uuid.uuid4())
"""
.........................................
WARNING: the benchmark result may be unstable
* the standard deviation (3.53 ns) is 15% of the mean (23.5 ns)
* the maximum (50.2 ns) is 114% greater than the mean (23.5 ns)
Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.
Mean +- std dev: 23.5 ns +- 3.5 ns
This is the tests done with pyperf with system tune:
(venv_3_10) marco@buzz:~$ sudo python -m pyperf system tune
[sudo] password for marco:
Tune the system configuration to run benchmarks
Actions
=======
Perf event: Max sample rate set to 1 per second
CPU Frequency: Minimum frequency of CPU 0-3 set to the maximum frequency
CPU scaling governor (intel_pstate): CPU scaling governor set to performance
Turbo Boost (intel_pstate): Turbo Boost disabled: '1' written into /sys/devices/system/cpu/intel_pstate/no_turbo
IRQ affinity: Stop irqbalance service
IRQ affinity: Set affinity of IRQ 122-124,126,129-131 to CPU 0-3
System state
============
CPU: use 4 logical CPUs: 0-3
Perf event: Maximum sample rate: 1 per second
ASLR: Full randomization
Linux scheduler: No CPU is isolated
CPU Frequency: 0-3=min=max=2600 MHz
CPU scaling governor (intel_pstate): performance
Turbo Boost (intel_pstate): Turbo Boost disabled
IRQ affinity: irqbalance service: inactive
IRQ affinity: Default IRQ affinity: CPU 0-3
IRQ affinity: IRQ affinity: IRQ 0-17,51,67,120-131=CPU 0-3
Power supply: the power cable is plugged
Advices
=======
Linux scheduler: Use isolcpus=<cpu list> kernel parameter to isolate CPUs
Linux scheduler: Use rcu_nocbs=<cpu list> kernel parameter (with isolcpus) to not schedule RCU on isolated CPUs
(venv_3_10) marco@buzz:~$ python -m pyperf timeit --rigorous "key not in o" -s """
import uuid
o = {str(uuid.uuid4()) : str(uuid.uuid4()) for x in range(5)}
key = str(uuid.uuid4())
"""
.........................................
WARNING: the benchmark result may be unstable
* the standard deviation (3.38 ns) is 11% of the mean (31.2 ns)
* the maximum (59.2 ns) is 90% greater than the mean (31.2 ns)
Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.
Mean +- std dev: 31.2 ns +- 3.4 ns
(venv_3_10) marco@buzz:~$ python -m pyperf timeit --rigorous "key not in o" -s """
import uuid
o = {str(uuid.uuid4()) : str(uuid.uuid4()) for x in range(5)}
key = str(uuid.uuid4())
"""
.........................................
Mean +- std dev: 31.5 ns +- 2.7 ns
(venv_3_10) marco@buzz:~$ python -m pyperf timeit --rigorous "key not in o" -s """
import uuid
o = {str(uuid.uuid4()) : str(uuid.uuid4()) for x in range(5)}
key = str(uuid.uuid4())
"""
.........................................
WARNING: the benchmark result may be unstable
* the standard deviation (3.76 ns) is 12% of the mean (31.8 ns)
Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.
Mean +- std dev: 31.8 ns +- 3.8 ns
Notice that I've not isolated any CPUs, since I have only 2 cores x 2 threads.
This is the test I've done for timeit. I tested the code using both system tune and system reset before:
from matplotlib import pyplot as plotter
import timeit
import uuid
import statistics
# sort of "warmup"
times = timeit.repeat(
stmt="key not in o",
setup="o={str(uuid.uuid4()):str(uuid.uuid4()) for x in range(5)}; key = str(uuid.uuid4())",
globals={"uuid": uuid}, repeat=1000
)
times = timeit.repeat(
stmt="key not in o",
setup="o={str(uuid.uuid4()):str(uuid.uuid4()) for x in range(5)}; key = str(uuid.uuid4())",
globals={"uuid": uuid}, repeat=1000
)
def getStatistic(data, step=None):
x = []
y = []
sorted_data = sorted(data)
# ad hoc value
if step == None:
step = (sorted_data[1] - sorted_data[0]) * 2
val = sorted_data[0]
prob = 0
for datum in data:
if datum < val + step:
prob += 1
else:
x.append(val)
y.append(prob)
prob = 1
val = datum
return (x, y)
(x, y) = getStatistic(times)
xbar = statistics.mean(times)
plotter.figure()
plotter.plot(x, y)
plotter.vlines(xbar, min(y), max(y), colors="r", label="mean")
plotter.grid(True)
plotter.show()
If I run this code with system reset, I get the multimodal statistic. If I run it with system tune, I get no peak. The resulting distribution is always increasing.
So I focused on the results I got from the test with system reset. I subsequently done:
sorted_times = sorted(times)
slice = sorted_times[0:600]
(x, y) = getStatistic(slice, step=(sorted_times[1] - sorted_times[0]) * 2)
xbar = statistics.mean(x)
plotter.figure()
plotter.plot(x, y)
plotter.vlines(xbar, min(y), max(y), colors="r", label="mean")
plotter.grid(True)
plotter.show()
that is, I dropped manually the slowest results (or, in other words, I considered only the first gaussian curve). The mean in this case is compatible with the resulting distribution. The results are:
>>> xbar
0.022125485974538606
>>> statistics.stdev(slice)
0.000367162257224316
that is compatible with pyperf run after system reset.
It seems that both you and timeit devs are right: it makes sense to use the normal distribution, but only around the minimum.
WARNING: the benchmark result may be unstable
- the standard deviation (3.53 ns) is 15% of the mean (23.5 ns)
- the maximum (50.2 ns) is 114% greater than the mean (23.5 ns)
It seems like you have a lot of outlines. You may analyze results with "pyperf stats" and "pyperf hist". You may try to increase --warmups.
Ok, I'll explain it better (and with better code. Ignore the first one, is bugged :) )
TL;DR I wrote a little script that seems to do the same results of your timeit. The only difference is that the error is nearer to the "reality", while pyperf overestimate it.
Long explaining:
I tried to raise the number of loops and teardowns to very high values. The result is the same:
(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ python -m pyperf timeit --rigorous "key not in o" --setup "from uuid import uuid4 ; o = {str(uuid4()).replace('-', '') : str(uuid4()).replace('-', '') for i in range(5)} ; key = str(uuid4()).replace('-', '')" -l 10000000 -w 20
........................................
WARNING: the benchmark result may be unstable
* the standard deviation (3.84 ns) is 12% of the mean (31.9 ns)
Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.
Mean +- std dev: 31.9 ns +- 3.8 ns
Then I tried to remove the "noise" in my test code. This is the new code:
#!/usr/bin/env python3
def main(loops, seconds, ratio):
from matplotlib import pyplot as plotter
import timeit
import uuid
import statistics
from time import perf_counter
def getStatistic(data, num=None, npiece=None):
x = []
y = []
sorted_data = sorted(data)
val = sorted_data[0]
prob = 1
if npiece == None:
# TODO does not work well with peaks
npiece = int(len(data) / 2)
if num == None:
num = len(data)
# TODO step could be variable, for peaks
step = (sorted_data[-1] - sorted_data[0]) / npiece
for datum in sorted_data:
if datum < val + step:
prob += 1
else:
x.append(val)
y.append(prob / num)
prob = 1
val = datum
return (x, y, x[y.index(max(y))])
# TODO check if there's an alternative to normal distribution
def ttest(data, t):
(x, y, xbar) = getStatistic(data)
while True:
sigma = statistics.stdev(data, xbar=xbar)
new_data = []
for x in data:
if abs(x - xbar) < t * sigma:
new_data.append(x)
old_data = data
data = new_data
if len(data) == len(old_data):
data.sort()
return data
try:
(x, y, null) = getStatistic(data)
except ZeroDivisionError:
old_data.sort()
return old_data
def plot(x_axes, y_axes, xbar):
for x, y in zip(x_axes, y_axes):
plotter.figure()
plotter.grid(True)
plotter.plot(x, y)
plotter.vlines(xbar, min(y), max(y), colors="r", label="xbar")
try:
plotter.show()
except KeyboardInterrupt:
plotter.close("all")
print()
def getMagnitudeOrder(num):
num_str = "{:.1e}".format(num)
return int(num_str.split("e")[1])
def getTimes(stmt, setup, seconds, ratio):
t = timeit.Timer(stmt=stmt, setup=setup)
number = int(t.autorange()[0] / ratio)
times = []
start_time = perf_counter()
while True:
times.append(t.timeit(number=number))
if perf_counter() - start_time > seconds:
break
return [t / number for t in times]
def bench(stmt, setup, seconds, ratio, t=2):
times = getTimes(stmt=stmt, setup=setup, seconds=seconds, ratio=ratio)
peak = ttest(times, t=t)
(x, y, xbar) = getStatistic(times)
(x2, y2, null) = getStatistic(peak, num=len(times))
sigma = statistics.stdev(peak, xbar=xbar)
return {
"xbar": xbar,
"sigma": sigma,
"x_times": x,
"y_times": y,
"x_peak": x2,
"y_peak": y2,
}
stmt = "key not in o"
setup = """
from uuid import uuid4
def getUuid():
return str(uuid4()).replace("-", "")
o = {getUuid() : getUuid() for i in range(5)}
key = getUuid()
"""
xbars = []
sigmas = []
for i in range(loops):
res = bench(
stmt=stmt,
setup=setup,
seconds=seconds,
ratio=ratio
)
xbar = res["xbar"]
sigma = res["sigma"]
xbars.append(xbar)
sigmas.append(sigma)
sigma = statistics.median_low(sigmas)
i = sigmas.index(sigma)
xbar = xbars[i]
x = res["x_times"]
y = res["y_times"]
x2 = res["x_peak"]
y2 = res["y_peak"]
xbar_magn = getMagnitudeOrder(xbar)
sigma_magn = getMagnitudeOrder(sigma)
decs = xbar_magn - sigma_magn
res_tpl = "Result = {xbar:." + str(decs) + "e} +- {sigma:.0e}"
print(res_tpl.format(xbar=xbar, sigma=sigma))
plot((x, x2), (y, y2), xbar)
main(loops=10, seconds=3, ratio=10)
Basically, the script do this:
- create a
timeit.Timerobjectt - take the
numberof loops fromt.autorange(), divided by 10 (arbitrary number... it seems to work) - run
t.timeitthenumberof times and sum the results, untildeltaseconds are passed (I set it arbitrarily to 3); this way I get a list oftimes - after that, I sort the
timesand I divide them with astep. By default is the difference between the max time and the min time, divided by the half length oftimes(arbitrary too). This way I have anx. Tthen I sample thetimescounting the number of times they are inside astep, then I divide this numbers by the length oftimes, to get a sort of probability. This way I have any. Furthermore, I set as the "true" value of x,xbar, the x that corresponds to the peak of the x,y diagram (the most probable time). - after that, I discard the most unprobable times, using a simple t-test. Then I repeat point 4 and 5 until no data is discarded. Notice that the stdev used for the t-test uses the
xbartaken before. - Then, I get the last
xbarand stdev. I do this 10 times and I get the median result.
The results (without the plot):
(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ python -VV
Python 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0](venv_3_6) marco@buzz:~/sources/cpython-frozendict$ ./bench2.py
Result = 2.086e-08 +- 2e-11
[... repeat ...]
Result = 2.096e-08 +- 3e-11
Result = 2.082e-08 +- 4e-11
Result = 2.079e-08 +- 3e-11
Result = 2.084e-08 +- 3e-11
Result = 2.097e-08 +- 4e-11
Result = 2.087e-08 +- 3e-11
Result = 2.091e-08 +- 1e-11
Result = 2.099e-08 +- 2e-11
Result = 2.080e-08 +- 3e-11
(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ python -m pyperf system show
Show the system configuration
System state
============
CPU: use 4 logical CPUs: 0-3
Perf event: Maximum sample rate: 25000 per second
ASLR: Full randomization
Linux scheduler: No CPU is isolated
CPU Frequency: 0-3=min=400 MHz, max=3500 MHz
CPU scaling governor (intel_pstate): powersave
Turbo Boost (intel_pstate): Turbo Boost enabled
IRQ affinity: irqbalance service: active
IRQ affinity: Default IRQ affinity: CPU 0-3
IRQ affinity: IRQ affinity: IRQ 0-17,51,67,120-121,125,127,129=CPU 0-3; IRQ 122=CPU 1,3; IRQ 123-124=CPU 0,2; IRQ 126=CPU 0; IRQ 128=CPU 1; IRQ 130=CPU 2; IRQ 131=CPU 3
Power supply: the power cable is plugged
Advices
=======
Perf event: Set max sample rate to 1
Linux scheduler: Use isolcpus=<cpu list> kernel parameter to isolate CPUs
Linux scheduler: Use rcu_nocbs=<cpu list> kernel parameter (with isolcpus) to not schedule RCU on isolated CPUs
CPU scaling governor (intel_pstate): Use CPU scaling governor 'performance'
Turbo Boost (intel_pstate): Disable Turbo Boost to get more reliable CPU frequency
Run "python -m pyperf system tune" to tune the system configuration to run benchmarks
(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ python -m pyperf timeit --rigorous "key not in o" --setup "from uuid import uuid4 ; o = {str(uuid4()).replace('-', '') : str(uuid4()).replace('-', '') for i in range(5)} ; key = str(uuid4()).replace('-', '')"
.........................................
WARNING: the benchmark result may be unstable
* the standard deviation (2.71 ns) is 12% of the mean (23.3 ns)
* the maximum (36.2 ns) is 55% greater than the mean (23.3 ns)
Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.
Mean +- std dev: 23.3 ns +- 2.7 ns
[...repeat...]
Mean +- std dev: 23.0 ns +- 2.8 ns
Mean +- std dev: 23.0 ns +- 2.2 ns
Mean +- std dev: 23.3 ns +- 2.8 ns
Mean +- std dev: 23.1 ns +- 2.6 ns
Mean +- std dev: 23.3 ns +- 2.9 ns
Mean +- std dev: 23.5 ns +- 2.6 ns
Mean +- std dev: 23.4 ns +- 2.8 ns
Mean +- std dev: 23.2 ns +- 2.2 ns
Mean +- std dev: 23.5 ns +- 2.4 ns
(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ sudo python -m pyperf system tune
[sudo] password for marco:
Tune the system configuration to run benchmarks
Actions
=======
Perf event: Max sample rate set to 1 per second
CPU Frequency: Minimum frequency of CPU 0-3 set to the maximum frequency
CPU scaling governor (intel_pstate): CPU scaling governor set to performance
Turbo Boost (intel_pstate): Turbo Boost disabled: '1' written into /sys/devices/system/cpu/intel_pstate/no_turbo
IRQ affinity: Stop irqbalance service
IRQ affinity: Set affinity of IRQ 122-124,126,128,130-131 to CPU 0-3
System state
============
CPU: use 4 logical CPUs: 0-3
Perf event: Maximum sample rate: 1 per second
ASLR: Full randomization
Linux scheduler: No CPU is isolated
CPU Frequency: 0-3=min=max=2600 MHz
CPU scaling governor (intel_pstate): performance
Turbo Boost (intel_pstate): Turbo Boost disabled
IRQ affinity: irqbalance service: inactive
IRQ affinity: Default IRQ affinity: CPU 0-3
IRQ affinity: IRQ affinity: IRQ 0-17,51,67,120-131=CPU 0-3
Power supply: the power cable is plugged
Advices
=======
Linux scheduler: Use isolcpus=<cpu list> kernel parameter to isolate CPUs
Linux scheduler: Use rcu_nocbs=<cpu list> kernel parameter (with isolcpus) to not schedule RCU on isolated CPUs(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ ./bench2.py
Result = 2.808e-08 +- 4e-11
Result = 2.816e-08 +- 2e-11
Result = 2.810e-08 +- 3e-11
Result = 2.812e-08 +- 4e-11
Result = 2.809e-08 +- 2e-11
Result = 2.814e-08 +- 2e-11
Result = 2.817e-08 +- 3e-11
Result = 2.810e-08 +- 3e-11
Result = 2.821e-08 +- 2e-11
Result = 2.818e-08 +- 3e-11
(venv_3_6) marco@buzz:~/sources/cpython-frozendict$ python -m pyperf timeit --rigorous "key not in o" --setup "from uuid import uuid4 ; o = {str(uuid4()).replace('-', '') : str(uuid4()).replace('-', '') for i in range(5)} ; key = str(uuid4()).replace('-', '')"
.........................................
WARNING: the benchmark result may be unstable
* the standard deviation (3.35 ns) is 11% of the mean (31.1 ns)
Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist to analyze results.
Use --quiet option to hide these warnings.
Mean +- std dev: 31.1 ns +- 3.4 ns
Mean +- std dev: 30.4 ns +- 2.9 ns
Mean +- std dev: 30.9 ns +- 3.1 ns
Mean +- std dev: 31.2 ns +- 3.9 ns
Mean +- std dev: 31.2 ns +- 2.8 ns
Mean +- std dev: 30.8 ns +- 2.8 ns
Mean +- std dev: 31.5 ns +- 3.6 ns
Mean +- std dev: 31.6 ns +- 3.4 ns
Mean +- std dev: 30.8 ns +- 2.7 ns
Mean +- std dev: 30.8 ns +- 3.2 ns
Some considerations:
- the error of my script seems to be more "realistic"
- the bench times of my script tends to be lower
So, as I said in my previous post, it seems that both you and timeit devs are right: you have to do some statistical calcula to remove the "noise", but the "real" value is around the minimum.
The pyperf API gives you access to all values. You can use min(), max(), median(), etc. You're free to ignore outliers.
But I don't want to change the default: mean +- std dev is likely the best compromise for most common use cases.
To measure a microbenchmark faster than 100 ns, you should follow pyperf advices:
Advices
=======
Linux scheduler: Use isolcpus=<cpu list> kernel parameter to isolate CPUs
Linux scheduler: Use rcu_nocbs=<cpu list> kernel parameter (with isolcpus) to not schedule RCU on isolated CPUs
Ok, I can try these kernel parameters, but I did not suggested to remove mean and stdev, but to remove first "unlikely" results. I don't know how do you measure these quantities, but if you don't remove the "tail" of the measures, it's just wrong, since mean and stdev applies on a Gaussian distribution and, if you see the graph, this is not a Gaussian distribution.
@Marco-Sulla the sporadic results might just be due to string hash randomization: the lowest 3 bits of random strings sometimes collide, so the dict lookup has to take an extra step (or several) to probe another slot. As a proof of this concept, running pyperf timeit "list(range(hash('python') % 10_000))" gave me the very unstable Mean +- std dev: 56.3 us +- 39.6 us. I would be curious if your results are replicated with a more deterministic benchmark, say, using integers.
mean and stdev applies on a Gaussian distribution
Regardless of the distribution, multiplying the mean by the size of the data is a good estimator for the sum. When benchmarking code that runs many times, the sum is generally what I think people are looking to estimate. Question: "If I run this a million times, how long will that take in total?" Regardless of the type of distribution, a very good answer is: "One million times the mean."
I don't think outliers should be ignored without good reason, and I think your outliers are legitimate data points in this case, since hash randomization is a real part of the environment in which the code will be run. One could probably find a way to force a particular string hashing seed and run the benchmarks more deterministically on multiple different seeds to get consistent results for each.
And standard deviation is a good heuristic measure of spread, which can help detect when something is behaving unexpectedly or inconsistently, even if you can't produce a precise confidence interval from it without knowing the type of distribution. Two numbers can't possibly convey everything, but IMO mean and standard deviation are a good default.
Regardless of the distribution, multiplying the mean by the size of the data is a good estimator for the sum.
Yes, but I calculate the mean and the standard dev, but removing the "tail". The data shows that the times are a multiple Gaussian, where the most high Gaussian is the first one, with the more fast benchs. So I remove the other gaussian. How? With a simple t-test. If the data is X sigma greater of the fastest bench, I remove it, where the sigma is calculated with the real value as the minimum, and not the mean. It's imprecise, since the minimum is the first tail of the first Gaussian, but it's a good approximation. This is my algorithm:
def mindev(data, xbar = None):
"""
This function calculates the stdev around the _minimum_ data, and not the
mean
"""
if not data:
raise ValueError("No data")
if xbar == None:
xbar = min(data)
sigma2 = 0
for x in data:
sigma2 += (x - xbar) ** 2
N = len(data) - 1
if N < 1:
N = 1
return sqrt(sigma2 / N)
def autorange(stmt, setup="pass", globals=None, ratio=1000, bench_time=10, number=None):
if setup == None:
setup = "pass"
t = timeit.Timer(stmt=stmt, setup=setup, globals=globals)
break_immediately = False
if number == None:
# get automatically the number of needed loops
a = t.autorange()
num = a[0]
# the number of loops
number = int(num / ratio)
if number < 1:
number = 1
# the number of repeat of loops
repeat = int(num / number)
if repeat < 1:
repeat = 1
else:
repeat = 1
break_immediately = True
results = []
data_tmp = t.repeat(number=number, repeat=repeat)
min_value = min(data_tmp)
# I create a list with minimum values
data_min = [min_value]
bench_start = time()
while 1:
# I get additional benchs until `bench_time` seconds passes
data_min.extend(t.repeat(number=number, repeat=repeat))
if break_immediately or time() - bench_start > bench_time:
break
# I sort the data...
data_min.sort()
# ... so the minimum is the fist datum
xbar = data_min[0]
i = 0
# I repeat until no data is removed
while i < len(data_min):
i = len(data_min)
# I calculate the sigma using the minimum as "real" value, and not
# the mean
sigma = mindev(data_min, xbar=xbar)
for i in range(2, len(data_min)):
# I thind the point where the data are are greater than
# 3 sigma. Data are sorted...
if data_min[i] - xbar > 3 * sigma:
break
k = i
# do not remove too much data
if i < 5:
k = 5
# remove the data with sigma > 3
del data_min[k:]
# I return the minimum as real value, with the sigma calculated around
# the minimum
return (min(data_min) / number, mindev(data_min, xbar=xbar) / number)
Maybe the last step, the return, is incorrect. I could return the normal mean and stdev, since data_min should contain only the first Gaussian.