oemetadata icon indicating copy to clipboard operation
oemetadata copied to clipboard

There is no concept to connect sources in the OEP

Open christian-rli opened this issue 5 years ago • 12 comments

If a table comes from one source it's fairly simple to connect the source to the original data.

As soon as there are two sources for one table it's not possible in the metadata to selectively connect the information to the different sources.

When a table has a source column there should be a way to link something like a bibtex key and either reference (doi) an absolute source path or provide the source yourself. This can then be referenced in the metadata.

christian-rli avatar Feb 24 '20 14:02 christian-rli

There is a deprecated section for literature at the OEP. But other projects do it better:

  • https://scholar.google.de/
  • https://www.base-search.net/about/de/
  • https://www.science.gov/
  • http://www.jurn.org/#gsc.tab=0

Ludee avatar Feb 24 '20 15:02 Ludee

Solution A: Each source in the metadata has an key entry = bibkey Each source in the metadata has an additional doi In the data, there is a coulmn source/bibkey linking to the metadata

Ludee avatar Feb 24 '20 15:02 Ludee

Solution A: Each source in the metadata has an key entry = bibkey Each source in the metadata has an additional doi In the data, there is a coulmn source/bibkey linking to the metadata

Repeating this to make sure I understand correctly: This will need two new keys in the metadata sources object: bibkey and doi. The value of 'bibkey' can also be found in a cell of the data in a column named 'source' or 'bibkey'. The value of 'doi' is only a doi, so it needs to be connected directly to the 'bibkey'. The structure needs to make sure that only one 'doi' belongs to one 'bibkey'. Did I understand correctly @Ludee ?

Does this also mean that with this solution only one source can be referenced for one line? Would it make sense to advise for an extra column in the data that includes the doi? It wold make it possible to match bibkey and doi automatically for the metadata. That function would still need to be written of course, but it might facilitate creating the sources for the metadata quite a bit.

christian-rli avatar Jul 22 '20 12:07 christian-rli

In modex we use the description field within a source (see example below) to add table and source identifier to the metadata (we use the bibtexkey as an identifier but that's not strict, could be any identifier value (e.g. a primary key id) as reference to a specific row in the data tables ). By that, we connect the source in the metadata with the bibtex file (if available). As we use the oedatamodel, we have a source column in the table. We insert the full bibtex citation and/or key there, by that we also connect the sources in the OEMetadata to the database table row (by id) and/or bibtexkey. This not perfect yet as we currently need to provide the bibtex file within the datapackage which is not compatible with the OEP. For future projects I would recommend adding the full citation text and bibtex key (as backup) in the source column for each row.

Example:

OEMetadata 1.4.1
...
"sources": [
{
        "title": "Impact of weighted average cost of capital, capital expenditure, and other parameters on future utility-scale {PV} levelised cost of electricity",
      ->"description": "[oed-table:scalar],[Bibtexkey:Vartiainen2019] - Impact of weighted average cost of capital, capital expenditure, and other parameters on future utility-scale PV levelised cost of electricity Progress in Photovoltaics: Research and Applications",
        "path": "10.1002/pip.3189",
        "licenses": [
            ...
        ]
    }
]
...

In the source column we can insert the same bibtexkey/full citation multiple times in different row´s.

image

jh-RLI avatar May 03 '21 12:05 jh-RLI

This issue may need further discussion to find a convenient solution. Hence, it will not yet be considered in oemetadata release v1.5 The proposed workaround seems to work well for projects that work with bib files, but introducing the key bibkey might deserve a second thought.

chrwm avatar Oct 11 '21 14:10 chrwm

FYI @srhbrnds @henhuy

Problem scope:

data & metadata = datapackage

  • maintain data & metdata user-friendly

Requirements on data & metadata (please add/edit):

  • identify data sources for each data point

  • identify licence and usage rights of completete datapackage easily --> this means, it should be easy to find the licence information of the overall dataset and how to use it

  • ensure possibility of identifying licences and rights of sources that make up dataset --> licence information of individual source of the dataset should be tracable

example table 1:

id region year storage_capacity charging_power fixed_cost variable_cost investment_cost operational_life_time mileage bandwidth_type version method source comment
1 Germany 2019 35 4 0.027 0.044 35000 12 12127 {'market_share': 'range'} {'storage_capacity':'LucadeTena2018', 'charging_power':'LucadeTena2018', 'fixed_cost':'ADAC2023', 'variable_cost':'ADAC2023', 'investment_cost':'ADAC2023', 'operational_life_time':'deTena2018', 'mileage':'motointegrator2023', 'occupancy_rate':'InstitutfuerangewandteSozialwissenschaftGmbH2018', 'market_share':'ownAssumptions', 'charging_efficiency' :'ownAssumptions', 'feed_in_efficiency':'ownAssumptions', 'energy_conversion_efficiency':'LucadeTena2018'}
2 Germany 2025 35 4 0.027 0.044 35000 12 12127 {'market_share': 'range'} {'storage_capacity':'deTena2018', 'charging_power':'deTena2018', 'fixed_cost':'ADAC2023', 'variable_cost':'ADAC2023', 'investment_cost':'ADAC2023', 'operational_life_time':'deTena2018', 'mileage':'motointegrator2023', 'occupancy_rate':'Mueller2013', 'market_share':'ownAssumptions', 'charging_efficiency' :'ownAssumptions', 'feed_in_efficiency':'ownAssumptions', 'energy_conversion_efficiency':'deTena2018'}
3 Germany 2030 35 5.5 0.03 0.044 32000 12 12127 {'market_share': 'range'} {'storage_capacity':'deTena2018', 'charging_power':'deTena2018', 'fixed_cost':'ADAC2023', 'variable_cost':'ADAC2023', 'investment_cost':'ADAC2023', 'operational_life_time':'deTena2018', 'mileage':'motointegrator2023', 'occupancy_rate':'Mueller2013', 'market_share':'ownAssumptions', 'charging_efficiency' :'ownAssumptions', 'feed_in_efficiency':'ownAssumptions', 'energy_conversion_efficiency':'deTena2018'}

example table 2:

id scenario_id region year input_energy_vector output_energy_vector technology technology_type parameter_name value unit tags method source comment
1909 2 ["Baltic"] 2016 air electricity wind turbine offshore installed capacity 338.8 MW WirtschaftundEnergieCap Dez 15
1910 2 ["North"] 2016 air electricity wind turbine offshore installed capacity 2956.1 MW WirtschaftundEnergieCap Dez 15
1911 2 ["BB","BE","BW","BY","HB","HE","HH","MV","NI","NW","RP","SH","SL","SN","ST","TH"] 2016 air electricity wind turbine onshore capital costs 1288000 €/MW {"value": "Interpolation 2015-2020"} DEA2020 p224
1912 2 ["BB","BE","BW","BY","HB","HE","HH","MV","NI","NW","RP","SH","SL","SN","ST","TH"] 2016 air electricity wind turbine onshore fixed costs 23280 €/MW/a {"value": "Interpolation 2015-2020"} DEA2020 p224
1913 2 ["BB","BE","BW","BY","HB","HE","HH","MV","NI","NW","RP","SH","SL","SN","ST","TH"] 2016 air electricity wind turbine onshore lifetime 25.4 years {"value": "Interpolation 2015-2020"} DEA2020 p224
1914 2 ["BB","BE","BW","BY","HB","HE","HH","MV","NI","NW","RP","SH","SL","SN","ST","TH","North","Baltic"] 2016 air electricity wind turbine offshore capital costs 2714000 €/MW {"value": "Interpolation 2015-2020"} DEA2020 p245
1915 2 ["BB","BE","BW","BY","HB","HE","HH","MV","NI","NW","RP","SH","SL","SN","ST","TH","North","Baltic"] 2016 air electricity wind turbine offshore fixed costs 53851.8 €/MW/a {"value": "Interpolation 2015-2020"} DEA2020 p245
1916 2 ["BB","BE","BW","BY","HB","HE","HH","MV","NI","NW","RP","SH","SL","SN","ST","TH","North","Baltic"] 2016 air electricity wind turbine offshore lifetime 25.4 years {"value": "Interpolation 2015-2020"} DEA2020 p245
2133 2 ["BB"] 2016 air electricity wind turbine onshore installed capacity 5700.03 MW MaStR2021
2134 2 ["BE"] 2016 air electricity wind turbine onshore installed capacity 11 MW MaStR2021

Proposed solution for OEM-1.6 or later:

  1. Introduce key bibSources and keep old structure within key e.g. library

Pro:

  • users have all sources structured in bibfile
  • with example tables above, individual data point are tracable to individual sources.
  • users who don't work with bibfiles can still document their sources

Con:

  • (licence information of individual source would be shifted to bibfile and not seen directly in metadata) - not a con, if not requirement
"sources":{ 
        "bibSources": "http://url_to_bib_file_with_bib_file",
        "library":
		[
        {
            "title": null,
            "description": null,
            "path": null,
            "licenses": [
                {
                    "name": null,
                    "title": null,
                    "path": null,
                    "instruction": null,
                    "attribution": null
                }
            ]
        },
        {
            "title": null,
            "description": null,
            "path": null,
            "licenses": [
                {
                    "name": null,
                    "title": null,
                    "path": null,
                    "instruction": null,
                    "attribution": null
                }
            ]
        }
    ]
}

Licence information and rights of usage for entire datapackage still easily accessible in

"licenses": [
        {
            "name": null,
            "title": null,
            "path": null,
            "instruction": null,
            "attribution": null
        }
    ],

chrwm avatar Mar 21 '23 13:03 chrwm

There is no designated bibfile field for licences. bibfile field note is shown per default in bibliography and could be used for licence information.

chrwm avatar Apr 06 '23 12:04 chrwm

I like the idea suggested by @chrwm. This would completely separate the recommended way of source management from the metadata. Of course, the current solution would still be available.

I will add concerns that might be relevant. Perhaps we decide to include your solution first and try to resolve the issues later.

One objection for me would be that we would require all users to use the Bibtex format if they want to cite sources and link them efficiently in the data. (Maybe this is not even bad)

Another point is that if we keep the current format and use the Bibtex format, we would have to handle two formats if we want to display the sources in the OEP, for example. We would have to save the Bibtex file and read from it to display the source information on the website. (This is feasible but extra work)

jh-RLI avatar Apr 06 '23 14:04 jh-RLI

One objection for me would be that we would require all users to use the Bibtex format if they want to cite sources and link them efficiently in the data. (Maybe this is not even bad)

In my solution the current way of handling sources and the new key bibsources would exist parallel. So people could use both.

I agree to the other concern.

chrwm avatar Apr 06 '23 15:04 chrwm

Okay, I thought the use case you presented would add more features, but it is aimed at usability - then there is no concern :)

jh-RLI avatar Apr 06 '23 15:04 jh-RLI

From today's meeting: A link to bibSources seems to be accepted as an extra OEM-key. The question of whether one should be able to see the licences of the individual sources in OEMetadata in addition to the total licence of the resource (without having to look into the source) is still open.

chrwm avatar Apr 11 '23 13:04 chrwm

To pick things up again - I propose to implemtent the following solution:

  1. add a link field that links to a file containing the sources, e.g. a bibfile.
  2. move current resource-sources into another field, e.g. individual
"sources":{ 
        "link": "http://url_to_bib_file_with_bib_file",
        "individual":
		[
        {
            "title": null,
            "description": null,
            "path": null,
            "licenses": [
                {
                    "name": null,
                    "title": null,
                    "path": null,
                    "instruction": null,
                    "attribution": null
                }
            ]
        },
        {
            "title": null,
            "description": null,
            "path": null,
            "licenses": [
                {
                    "name": null,
                    "title": null,
                    "path": null,
                    "instruction": null,
                    "attribution": null
                }
            ]
        }
    ]
}

  1. if link points to bibfile, motivate to save licence information in the note field, which is displayed by default, e.g.:
@misc{huelk2022,
    author = {Hülk, Ludwig and Pleßmann, Guido and Muschner, Christoph and Kotthoff, Florian and Tepe, Deniz},
    title = {open-MaStR - Marktstammdatenregister},
    DOI = {10.5281/zenodo.6807426},
    publisher = {Zenodo},
    year = {2022},
    month = {Jul},
    note = {License information: Marktstammdatenregister - © Bundesnetzagentur für Elektrizität, Gas, Telekommunikation, Post und Eisenbahnen | DL-DE-BY-2.0}
}

grafik

chrwm avatar Aug 07 '23 15:08 chrwm

We just discussed the current state and possible solutions and came to the conclusion we will not implement an update in the upcoming version 2.0.

Ludee avatar Oct 16 '24 13:10 Ludee