activity-browser icon indicating copy to clipboard operation
activity-browser copied to clipboard

Add CPC and ISIC classification to metadatastore so that it can be use for aggregation

Open bsteubing opened this issue 3 years ago • 1 comments

Just like we can already aggregate LCIA results (e.g. in process contributions) by reference product, name or location, we should add CPC and ISIC classification data (not sure if the latter is imported by bw) to the metadatastore so that we can aggregate process contributions by these.

bsteubing avatar May 07 '21 07:05 bsteubing

Update to this: The data is imported to the metadatastore (see #632).

marc-vdm avatar Feb 10 '22 09:02 marc-vdm

Thoughts/Update to make this more useable:

current state

We already load classifications to MetaDataStore. It's easiest to see the output by adding classifications to this list and opening an ecoinvent database in AB. See here for example output in AB.

classification systems

classifications is a list with tuples, the tuples are formatted to contain the classification system and it's class e.g. [('ISIC rev.4 ecoinvent', '2420:Manufacture of basic precious and other non-ferrous metals'), ('CPC', '34612: Ammonium sulphate')]. It seems two systems are available for most (all?) systems: ISIC rev.4 and CPC.

TODO

  • [x] Figure out if ISIC rev.4 or CPC would be 'better' for LCA data. e.g. more focused on technical vs agricultural products (documentation for both systems in ISIC/CPC links above)
    • Answer:
    • The CPC as a classification of products has a strong natural relationship with the classification of economic activities, ISIC. The CPC and ISIC are both general-purpose classifications, with the ISIC representing the activity side of these two interrelated United Nations classifications.

    • See also section 14 of the CPC documentation
    • While for use in database view, ISIC or CPC would not matter much, for use in Contribution analysis, we'd best use ISIC.
  • [x] For both systems, different versions exist. Figure out if different versions are often used in different ecoinvent versions? (pro for ISIC, it has rev.4 in name, so we know the version at least)
    • Answer:
    • ISIC has 2 advantages over CPC:
      • It has a version name in Brightway (so we know which ISIC files we need)
      • rev.4 has been in use since 2008, this doesn't seem to change all that often. I checked ecoinvent versions 3.6, 3.7 and 3.8 and they all use rev.4
  • [x] Decide which classification would be best to use in AB (we could also implement both, but that's more work to maintain once versions change etc.)
    • Answer: ISIC seems preferable

using a system in AB

There are a few obvious places to use classification in AB

  1. In the Activities table in AB
    • The easy version would be to just show a classification
    • The cool way would be to implement a tree view for the database based on the classification (see also #632)
      • AB would need to detect whether the shown database has classifications and enable a tree/list view option like we have with impact categories to view the tree. (we could just always show the radio buttons, and grey out the tree if it's not compatible with the database)
        • tangent on a tangent: We could also make a tree view for biosphere databases based on their compartments
  2. In process contribution analysis of results
    • Here, user would be able to filter on a certain level of the classification
    • An even cooler way would be to filter on all levels at once with a sunburst graph
      • We could even show a treeview of results impacts following the classifications
      • Of course this would be better off in it's own tab as it wouldn't be related to the 'normal' process contributions tab anymore
  3. Less interesting, but could still be nice: Allowing users to add a classification to their own processes, we could just add the field in the same section as Name/Location/Database (we shouldn't add it to the Products table if we use ISIC, as ISIC is for activities, CPC is for products)

considerations for all uses

For both uses of a classification, we'd need a method to properly show the classification in the MetaDataStore. The current list with multiple methods is nice, but we'd need to extract the right information efficiently. Additionally, we need to know more about the structure of the classifications. This would mean storing the structure somewhere in AB, either as a variable in python code or as a file on disk (or look it up dynamically from internet?, the links above do have a .txt with the full classification system).

TODO

  • [x] Find efficient system to get to the right class of an activity. Extracting and storing as an extra column in MetaDataStore perhaps? Or would only reading the list-->tuple when needed be faster?
    • Answer: We can add a function call here like in my branch here
    • This will provide us with the correct column(s), but will come at a performance hit during loading of ~0.4s/column for ecoinvent. To me this time seems reasonable time-wise: We need to read every line in the database and then break the classifications list and look at the tuples to see if we have the correct one, then transfer the data to a new column.
  • [ ] What is a good way to store the overarching classification system structure somewhere?
    • A file? A python variable?
    • Where do we store it?
    • How do we manage multiple versions of the classification?
    • How do we manage multiple ecoinvent versions in one project using different classification versions?

After all TODO above have been figured out, I think we can start looking at actual implementations for either Activities table/Process contributions/other uses.

Any help with TODOs from anyone is highly appreciated!

marc-vdm avatar Nov 23 '22 21:11 marc-vdm