activity-browser
activity-browser copied to clipboard
Add CPC and ISIC classification to metadatastore so that it can be use for aggregation
Just like we can already aggregate LCIA results (e.g. in process contributions) by reference product, name or location, we should add CPC and ISIC classification data (not sure if the latter is imported by bw) to the metadatastore so that we can aggregate process contributions by these.
Update to this: The data is imported to the metadatastore (see #632).
Thoughts/Update to make this more useable:
current state
We already load classifications
to MetaDataStore
. It's easiest to see the output by adding classifications
to this list and opening an ecoinvent database in AB. See here for example output in AB.
classification systems
classifications
is a list with tuples, the tuples are formatted to contain the classification system and it's class e.g. [('ISIC rev.4 ecoinvent', '2420:Manufacture of basic precious and other non-ferrous metals'), ('CPC', '34612: Ammonium sulphate')]
. It seems two systems are available for most (all?) systems: ISIC rev.4
and CPC
.
TODO
- [x] Figure out if
ISIC rev.4
orCPC
would be 'better' for LCA data. e.g. more focused on technical vs agricultural products (documentation for both systems in ISIC/CPC links above)- Answer:
-
The CPC as a classification of products has a strong natural relationship with the classification of economic activities, ISIC. The CPC and ISIC are both general-purpose classifications, with the ISIC representing the activity side of these two interrelated United Nations classifications.
- See also section 14 of the CPC documentation
- While for use in database view, ISIC or CPC would not matter much, for use in Contribution analysis, we'd best use ISIC.
- [x] For both systems, different versions exist. Figure out if different versions are often used in different ecoinvent versions? (pro for ISIC, it has rev.4 in name, so we know the version at least)
- Answer:
- ISIC has 2 advantages over CPC:
- It has a version name in Brightway (so we know which ISIC files we need)
- rev.4 has been in use since 2008, this doesn't seem to change all that often. I checked ecoinvent versions 3.6, 3.7 and 3.8 and they all use rev.4
- [x] Decide which classification would be best to use in AB (we could also implement both, but that's more work to maintain once versions change etc.)
- Answer: ISIC seems preferable
using a system in AB
There are a few obvious places to use classification in AB
- In the
Activities
table in AB- The easy version would be to just show a classification
- The cool way would be to implement a tree view for the database based on the classification (see also #632)
- AB would need to detect whether the shown database has classifications and enable a tree/list view option like we have with impact categories to view the tree. (we could just always show the radio buttons, and grey out the tree if it's not compatible with the database)
- tangent on a tangent: We could also make a tree view for
biosphere
databases based on their compartments
- tangent on a tangent: We could also make a tree view for
- AB would need to detect whether the shown database has classifications and enable a tree/list view option like we have with impact categories to view the tree. (we could just always show the radio buttons, and grey out the tree if it's not compatible with the database)
- In process contribution analysis of results
- Here, user would be able to filter on a certain level of the classification
- An even cooler way would be to filter on all levels at once with a sunburst graph
- We could even show a treeview of results impacts following the classifications
- Of course this would be better off in it's own tab as it wouldn't be related to the 'normal' process contributions tab anymore
- Less interesting, but could still be nice: Allowing users to add a classification to their own processes, we could just add the field in the same section as
Name
/Location
/Database
(we shouldn't add it to theProducts
table if we use ISIC, as ISIC is for activities, CPC is for products)
considerations for all uses
For both uses of a classification, we'd need a method to properly show the classification in the MetaDataStore
. The current list with multiple methods is nice, but we'd need to extract the right information efficiently. Additionally, we need to know more about the structure of the classifications. This would mean storing the structure somewhere in AB, either as a variable in python code or as a file on disk (or look it up dynamically from internet?, the links above do have a .txt
with the full classification system).
TODO
- [x] Find efficient system to get to the right class of an activity. Extracting and storing as an extra column in
MetaDataStore
perhaps? Or would only reading the list-->tuple when needed be faster?- Answer: We can add a function call here like in my branch here
- This will provide us with the correct column(s), but will come at a performance hit during loading of ~0.4s/column for ecoinvent. To me this time seems reasonable time-wise: We need to read every line in the database and then break the
classifications
list and look at the tuples to see if we have the correct one, then transfer the data to a new column.
- [ ] What is a good way to store the overarching classification system structure somewhere?
- A file? A python variable?
- Where do we store it?
- How do we manage multiple versions of the classification?
- How do we manage multiple ecoinvent versions in one project using different classification versions?
After all TODO above have been figured out, I think we can start looking at actual implementations for either Activities table/Process contributions/other uses.
Any help with TODOs from anyone is highly appreciated!