boaviztapi
boaviztapi copied to clipboard
Improving the reporting of uncertainties in the calculation made by boaviztapi
Problem
I believe we should improve the way we handle uncertainty in the calculation made in boaviztapi. This issue is mainly meant to start the discussion on how that could be improved.
When using the API to request the impact of a server, vm etc., figures are given with a high number of digits, a significant_figures
parameter, and a min and max value.
For example :
"gwp": {
"embedded": {
"value": 636.11,
"significant_figures": 5,
"min": 252.48,
"max": 2010.6,
"warnings": [
"End of life is not included in the calculation"
]
},
Based on the discussion I had with @da-ekchajzer on mattermost
- the min and max value are calculated based on the uncertainties with have on the configuration of the equipment. For example, for a server if the request did not specify the exact amount of RAM we calculate the impact for a high and a low amount of RAM.
- the uncertainties coming from the reference data is not represented. e.g. we could consider that the value of the impact of 1GB of RAM or 1mm2 of die has an uncertainty of 10%, but we do not account explicitly for that in the result.
- the value of the
significant_figures
is not really calculated but mostly comes from a configuration file.
I the example above, I think the value 636.11
does not really have 5 significant digits and, given the min and max the api returns for the impact, we should apply some rounding. We can probably say that the gwp impact for this server is around 600 kgCO2e, but 636,11 is clearly too precise ;)
Solution
Regarding the rounding I suggest using something like the function bellow (rough code, needs polishing !) : it rounds the value based on the delta between the min and max value returned and a precision
parameter (with is a %) ; if there is a large difference between min and max, the rounding is more aggressive.
For example :
- Approx for 242.48 < 636.11 < 2016.6 precision 10% = 600
- Approx for 242.48 < 636.11 < 2016.6 precision 1% = 640
def round_value(val, min_val, max_val, precision):
# value for precision% of the min max delta
approx = (max_val - min_val) / (100/precision)
significant = math.floor(math.log10(approx))
rounded = round(val / 10 ** significant) * 10**significant
return rounded
Alternatives
This approach helps solving the rounding issue, something else is needed for the uncertainties coming on the references data used for the calculation.
Thanks for this proportion. Using min/max and a log10 function is a great way of handling the very different figures we have. Here are my comments :
-
The rounding function is not always working. I think should keep the existing one which handle all cases : https://github.com/Boavizta/boaviztapi/blob/795bbb2334d4e39cdd4954e3bea878fd20ed9e4c/boaviztapi/utils/roundit.py#L33
-
Precision could be set by default from the config file and override if needed (I don't see the case for now).
def round_value(val, min_val, max_val, precision=config["default_precision"]):
"""
Rounds the value based on the delta between the min and max value returned and a precision parameter
"""
# value for precision% of the min max delta
approx = (max_val - min_val) / (100/precision)
significant = math.floor(math.log10(approx))
return float(to_precision(val, significant))
I can make a PR with this implementation if you think it's ok.
I believe there is an issue with the approx variable. The more difference there is between min and max, the bigger approx is and the number of significant figures. I believe we want the opposite. See some examples :
value | approx | min_val | max_val | log10(approx) |
---|---|---|---|---|
20.29217 | 7.0896360000000005 | 10.91891 | 81.81527 | 0 |
1819.821672 | 530.9138167884 | 110.14710120000001 | 5419.285269084 | 2 |
0.02040328338 | 2.0873039999997484e-06 | 0.020400523740000003 | 0.02042139678 | -6 |
0.00030760589391948 | 0.0001208433425274 | 6.3406418256e-05 | 0.00127183984353 | -4 |
306.0165 | 95.3682 | 179.92950000000002 | 1133.6115 | 1 |
61648.853641199996 | 224191.01528027997 | 62.2570572 | 2241972.40986 | 5 |
I think the issue comes from using the significant as a parameter for
to_precisionmethod . See for exemple with a value of
6.2301with min=
3.617and max=
10.023` (and 10% precision)
round_value (original) for 3.617 < 6.2301 < 10.023 precision 10% = 6.2
round_value (with to_precision) for 3.617 < 6.2301 < 10.023 precision 10% = 0.0
The second option is obviously wrong, it is not even in the range between min and max !
The confusion probably come from the naming of my significant
variable which is not really the number of significant figures, at least not the way it is calculated by significant_number(x)
https://github.com/Boavizta/boaviztapi/blob/795bbb2334d4e39cdd4954e3bea878fd20ed9e4c/boaviztapi/utils/roundit.py#L9C18-L9C18
I'll make a PR with a set of test cases, it will be easier to discuss the implementation
I've added a PR with a rouding function that handles corner cases : #220
I am currently implementing the function in the code.
I think that there is a problem when min=max=value. When so, we don't round at all, even though it gives a precision which is way too high compare to the uncertainty of the impact factors.
Example
"gwp": {
"embedded": {
"value": 23.77907,
"min": 23.77907,
"max": 23.77907,
"warnings": [
"End of life is not included in the calculation"
]
},
"use": {
"value": 243.69843455999998,
"min": 243.69843455999998,
"max": 243.69843455999998
},
"unit": "kgCO2eq",
"description": "Total climate change"
},
Solutions
To account for the uncertainty of the impact factors, we could :
- Hard code a maximal sig_fig numbers
- Apply a ratio of x% to the maximal and minimal, which correspond to the uncertainty of the impact factor.
What are your thought about it ?
yes, as the rounding is based on the difference between min and max, when min == max it does nothing (and it's by design ;) )
I believe the "right way" to handle this would be to specify an uncertainty on the base impact factor and propagate it when calculating the overall impact (the uncertainties python library could help here https://pythonhosted.org/uncertainties/ ) . This should probably be another issue and PR, for the next version.
For now, I think we should fall back, in this specific case, to the current method where we simply cut after a fixed number of sig_fig.