activity-browser icon indicating copy to clipboard operation
activity-browser copied to clipboard

`#NUM!` values in `amount` and `loc` columns in XLSX are imported as the number 36 by the Activity Browser.

Open mixib opened this issue 4 years ago • 2 comments

Consider an XLSX file in the brightway import format, with the value #NUM! for some of the exchanges amount and loc columns.

bug01_XLinput

Following the way in which amounts are calculated based on formulas containing parameters on import, I expected that the amount would be calculated as a mean value based on the uncertainty data. Instead it shows as the number 36 on the Activity Browser. Only when modifying the uncertainty fields exchange by exchange there is a prompt to update the amount to match with the mean.

bug01_prompt

According to brightway's documentation, the uncertainty type fields are defined in the package stats_arrays. When using a triangular distribution (ID-5), it is expected that when the mode is not provided, the loc value becomes the average between the minimum and the maximum values. Then, I expected #NUM! in the loc field to be replaced by the calculated loc. Instead it shows as the number 36 on the Activity Browser.

bug01_loc36

Is this normal behaviour? If so, is it documented that the LCA results tab reports on calculations based on the amount and not on the uncertainty fields?

mixib avatar Feb 01 '21 16:02 mixib

This... is pretty bizarre. Can you forward the information about how distributions make up for missing data?

So this #NUM! field is always translated to 36 during the import? That shouldn't be normal behaviour. But otherwise it kind of is normal behaviour that the AB (and I think brightway) make use of the amount field by default and only look into the uncertainty fields when performing Monte Carlo analysis or similar.

dgdekoning avatar Feb 02 '21 16:02 dgdekoning

Hi @dgdekoning, I edited my original reply since I looked further into it and found more information.

Static vs stochastic LCA

But otherwise it kind of is normal behaviour that the AB (and I think brightway) make use of the amount field by default and only look into the uncertainty fields when performing Monte Carlo analysis or similar.

It makes sense since the default calculation is static. Indeed, the brightway documentation on the stochastic LCA mentions that the random numbers created according to the uncertainty fields overwrite the amount field.

Sub-issue 1: The bw/AB importer interprets XL error codes as numbers

So this #NUM! field is always translated to 36 during the import?

Yes. In fact, every error value in an XLS/XLSX file translates to a specific number. It is irrelevant if the error value is under amount or under an uncertainty field. Here is some information on the xlrd error codes matching the error values. However, it is very strange that bw/AB read them as numbers when xlrd uses the error tag. Parsing all of them to np.nan would be more appropriate, right?

When the uncertainty fields are used, the bw/AB exporter may fill some fields with #NUM values. The problem comes when the XLSX file is imported back to bw/AB because the #NUM values are parsed as the number corresponding to the xlrd error codes instead of an empty field.

This is how it looks like at the xlrd level:

bug01_xlrd

import xlrd
book = xlrd.open_workbook(myfile.xlsx)
sh = book.sheet_by_index(0)

for rx in range(sh.nrows):
    print(sh.row(rx))
Out:
[text:"Content is '=#NUM'", error:36]
[text:"Content is '=#N/A'", error:42]
[text:"Content is '=#NULL!'", error:0]
[text:"Content is '=1/0'", error:7]
[text:"Content is '=#REF!'", error:23]
[text:"Content is '=#NAME?'", error:29]
[text:"Content is '=#VALUE!'", error:15]
[text:"Content is '=PI()'", number:3.141592653589793]
[text:"Content is '42'", number:42.0]

Sub-issue 2: Outdated documentation of bw's stats_arrays

Can you forward the information about how distributions make up for missing data?

It seems that the documentation of the stats_arrays package is not up to date. The table under 'Mapping parameter array columns to uncertainty distributions' provides an overview of the supposedly required and optional parameters. If the optional parameters aren't specified, then these parameters are supposed to be calculated by default. See an excerpt of the table below.

bug01_defaultCalculations

However, testing with an example, if no loc value is specified for the triangular uncertainty or if it is np.nan, then the random number generator yields nan. And if there is no minimum value specified for the uniform or triangular uncertainty, or if it is np.nan, then there is an ImproperBoundsError message associated to these lines:

https://github.com/brightway-lca/stats_arrays/blob/68519a26e6ffe5ed7cefbaadbcdcc048efe75994/stats_arrays/distributions/base.py#L301-L306

This is how it looks like at the stats_arrays level:

from stats_arrays import *
import numpy as np

my_variable = UncertaintyBase.from_dicts(
    {'uncertainty_type': TriangularUncertainty.id,
     'loc': np.nan,#valid input, unexpected output
     'minimum': 15.0, #valid input
     'maximum': 25.0},
    {'uncertainty_type': DiscreteUniform.id,
     'minimum': np.nan, #valid input, expected output
     'maximum': 25.0},
    {'uncertainty_type': UniformUncertainty.id,
     # 'minimum': np.nan, #invalid input
     'minimum': 15, # valid input
     'maximum': 25.0}
)
my_rng = MCRandomNumberGenerator(my_variable)
print(list(zip(my_rng, range(10))))
Out:
[(array([        nan,  7.        , 17.37254671]), 0),
 (array([        nan, 18.        , 22.36840769]), 1),
 (array([        nan, 12.        , 22.60361591]), 2),
 (array([        nan,  7.        , 22.05667132]), 3),
 (array([        nan,  3.        , 16.50150731]), 4),
 (array([        nan, 23.        , 17.09835987]), 5),
 (array([        nan,  1.        , 20.91978931]), 6),
 (array([        nan, 19.        , 18.35423132]), 7),
 (array([        nan, 21.        , 21.61874973]), 8),
 (array([        nan, 22.        , 19.60915629]), 9)]

I think that the triangular and uniform uncertainty are missing the files to deal with the variables marked as optional. For instance, there is a file for the discrete uniform distribution additional to the base.py file. In the additional file, the lack of the optional minimum value is addressed in the next lines:

https://github.com/brightway-lca/stats_arrays/blob/68519a26e6ffe5ed7cefbaadbcdcc048efe75994/stats_arrays/distributions/discrete_uniform.py#L29-L33

mixib avatar Feb 04 '21 16:02 mixib

Hi Mixib,

so I've been going through this and am finding that brightway (version 2) is dealing with these errors slightly differently now. When I've introduced them what occurs is that brightway excludes the 'amount' field, which is personally not an ideal situation, however it is safe and does allow us to throw some warnings/errors.

What I would propose to do myself is to introduce an error message that points to the activity in the database with an exchange that is absent (so should probably contain an error)

Zoophobus avatar Jan 25 '23 12:01 Zoophobus

Closing as resolved in #921

marc-vdm avatar Sep 14 '23 15:09 marc-vdm