oemetadata
oemetadata copied to clipboard
There is no concept to connect sources in the OEP
If a table comes from one source it's fairly simple to connect the source to the original data.
As soon as there are two sources for one table it's not possible in the metadata to selectively connect the information to the different sources.
When a table has a source column there should be a way to link something like a bibtex key and either reference (doi) an absolute source path or provide the source yourself. This can then be referenced in the metadata.
There is a deprecated section for literature at the OEP. But other projects do it better:
- https://scholar.google.de/
- https://www.base-search.net/about/de/
- https://www.science.gov/
- http://www.jurn.org/#gsc.tab=0
Solution A: Each source in the metadata has an key entry = bibkey Each source in the metadata has an additional doi In the data, there is a coulmn source/bibkey linking to the metadata
Solution A: Each source in the metadata has an key entry = bibkey Each source in the metadata has an additional doi In the data, there is a coulmn source/bibkey linking to the metadata
Repeating this to make sure I understand correctly: This will need two new keys in the metadata sources object: bibkey
and doi
. The value of 'bibkey' can also be found in a cell of the data in a column named 'source' or 'bibkey'. The value of 'doi' is only a doi, so it needs to be connected directly to the 'bibkey'. The structure needs to make sure that only one 'doi' belongs to one 'bibkey'. Did I understand correctly @Ludee ?
Does this also mean that with this solution only one source can be referenced for one line? Would it make sense to advise for an extra column in the data that includes the doi? It wold make it possible to match bibkey and doi automatically for the metadata. That function would still need to be written of course, but it might facilitate creating the sources for the metadata quite a bit.
In modex we use the description field within a source (see example below) to add table and source identifier to the metadata (we use the bibtexkey as an identifier but that's not strict, could be any identifier value (e.g. a primary key id) as reference to a specific row in the data tables ). By that, we connect the source in the metadata with the bibtex file (if available). As we use the oedatamodel, we have a source column in the table. We insert the full bibtex citation and/or key there, by that we also connect the sources in the OEMetadata to the database table row (by id) and/or bibtexkey. This not perfect yet as we currently need to provide the bibtex file within the datapackage which is not compatible with the OEP. For future projects I would recommend adding the full citation text and bibtex key (as backup) in the source column for each row.
Example:
OEMetadata 1.4.1
...
"sources": [
{
"title": "Impact of weighted average cost of capital, capital expenditure, and other parameters on future utility-scale {PV} levelised cost of electricity",
->"description": "[oed-table:scalar],[Bibtexkey:Vartiainen2019] - Impact of weighted average cost of capital, capital expenditure, and other parameters on future utility-scale PV levelised cost of electricity Progress in Photovoltaics: Research and Applications",
"path": "10.1002/pip.3189",
"licenses": [
...
]
}
]
...
In the source column we can insert the same bibtexkey/full citation multiple times in different row´s.
This issue may need further discussion to find a convenient solution. Hence, it will not yet be considered in oemetadata release v1.5
The proposed workaround seems to work well for projects that work with bib files, but introducing the key bibkey
might deserve a second thought.
FYI @srhbrnds @henhuy
Problem scope:
data & metadata = datapackage
- maintain data & metdata user-friendly
Requirements on data & metadata (please add/edit):
-
identify data sources for each data point
-
identify licence and usage rights of completete datapackage easily --> this means, it should be easy to find the licence information of the overall dataset and how to use it
-
ensure possibility of identifying licences and rights of sources that make up dataset --> licence information of individual source of the dataset should be tracable
example table 1:
id | region | year | storage_capacity | charging_power | fixed_cost | variable_cost | investment_cost | operational_life_time | mileage | bandwidth_type | version | method | source | comment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Germany | 2019 | 35 | 4 | 0.027 | 0.044 | 35000 | 12 | 12127 | {'market_share': 'range'} | {'storage_capacity':'LucadeTena2018', 'charging_power':'LucadeTena2018', 'fixed_cost':'ADAC2023', 'variable_cost':'ADAC2023', 'investment_cost':'ADAC2023', 'operational_life_time':'deTena2018', 'mileage':'motointegrator2023', 'occupancy_rate':'InstitutfuerangewandteSozialwissenschaftGmbH2018', 'market_share':'ownAssumptions', 'charging_efficiency' :'ownAssumptions', 'feed_in_efficiency':'ownAssumptions', 'energy_conversion_efficiency':'LucadeTena2018'} | |||
2 | Germany | 2025 | 35 | 4 | 0.027 | 0.044 | 35000 | 12 | 12127 | {'market_share': 'range'} | {'storage_capacity':'deTena2018', 'charging_power':'deTena2018', 'fixed_cost':'ADAC2023', 'variable_cost':'ADAC2023', 'investment_cost':'ADAC2023', 'operational_life_time':'deTena2018', 'mileage':'motointegrator2023', 'occupancy_rate':'Mueller2013', 'market_share':'ownAssumptions', 'charging_efficiency' :'ownAssumptions', 'feed_in_efficiency':'ownAssumptions', 'energy_conversion_efficiency':'deTena2018'} | |||
3 | Germany | 2030 | 35 | 5.5 | 0.03 | 0.044 | 32000 | 12 | 12127 | {'market_share': 'range'} | {'storage_capacity':'deTena2018', 'charging_power':'deTena2018', 'fixed_cost':'ADAC2023', 'variable_cost':'ADAC2023', 'investment_cost':'ADAC2023', 'operational_life_time':'deTena2018', 'mileage':'motointegrator2023', 'occupancy_rate':'Mueller2013', 'market_share':'ownAssumptions', 'charging_efficiency' :'ownAssumptions', 'feed_in_efficiency':'ownAssumptions', 'energy_conversion_efficiency':'deTena2018'} |
example table 2:
id | scenario_id | region | year | input_energy_vector | output_energy_vector | technology | technology_type | parameter_name | value | unit | tags | method | source | comment |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1909 | 2 | ["Baltic"] | 2016 | air | electricity | wind turbine | offshore | installed capacity | 338.8 | MW | WirtschaftundEnergieCap | Dez 15 | ||
1910 | 2 | ["North"] | 2016 | air | electricity | wind turbine | offshore | installed capacity | 2956.1 | MW | WirtschaftundEnergieCap | Dez 15 | ||
1911 | 2 | ["BB","BE","BW","BY","HB","HE","HH","MV","NI","NW","RP","SH","SL","SN","ST","TH"] | 2016 | air | electricity | wind turbine | onshore | capital costs | 1288000 | €/MW | {"value": "Interpolation 2015-2020"} | DEA2020 | p224 | |
1912 | 2 | ["BB","BE","BW","BY","HB","HE","HH","MV","NI","NW","RP","SH","SL","SN","ST","TH"] | 2016 | air | electricity | wind turbine | onshore | fixed costs | 23280 | €/MW/a | {"value": "Interpolation 2015-2020"} | DEA2020 | p224 | |
1913 | 2 | ["BB","BE","BW","BY","HB","HE","HH","MV","NI","NW","RP","SH","SL","SN","ST","TH"] | 2016 | air | electricity | wind turbine | onshore | lifetime | 25.4 | years | {"value": "Interpolation 2015-2020"} | DEA2020 | p224 | |
1914 | 2 | ["BB","BE","BW","BY","HB","HE","HH","MV","NI","NW","RP","SH","SL","SN","ST","TH","North","Baltic"] | 2016 | air | electricity | wind turbine | offshore | capital costs | 2714000 | €/MW | {"value": "Interpolation 2015-2020"} | DEA2020 | p245 | |
1915 | 2 | ["BB","BE","BW","BY","HB","HE","HH","MV","NI","NW","RP","SH","SL","SN","ST","TH","North","Baltic"] | 2016 | air | electricity | wind turbine | offshore | fixed costs | 53851.8 | €/MW/a | {"value": "Interpolation 2015-2020"} | DEA2020 | p245 | |
1916 | 2 | ["BB","BE","BW","BY","HB","HE","HH","MV","NI","NW","RP","SH","SL","SN","ST","TH","North","Baltic"] | 2016 | air | electricity | wind turbine | offshore | lifetime | 25.4 | years | {"value": "Interpolation 2015-2020"} | DEA2020 | p245 | |
2133 | 2 | ["BB"] | 2016 | air | electricity | wind turbine | onshore | installed capacity | 5700.03 | MW | MaStR2021 | |||
2134 | 2 | ["BE"] | 2016 | air | electricity | wind turbine | onshore | installed capacity | 11 | MW | MaStR2021 |
Proposed solution for OEM-1.6 or later:
- Introduce key
bibSources
and keep old structure within key e.g.library
Pro:
- users have all sources structured in bibfile
- with example tables above, individual data point are tracable to individual sources.
- users who don't work with bibfiles can still document their sources
Con:
- (licence information of individual source would be shifted to bibfile and not seen directly in metadata) - not a con, if not requirement
"sources":{
"bibSources": "http://url_to_bib_file_with_bib_file",
"library":
[
{
"title": null,
"description": null,
"path": null,
"licenses": [
{
"name": null,
"title": null,
"path": null,
"instruction": null,
"attribution": null
}
]
},
{
"title": null,
"description": null,
"path": null,
"licenses": [
{
"name": null,
"title": null,
"path": null,
"instruction": null,
"attribution": null
}
]
}
]
}
Licence information and rights of usage for entire datapackage still easily accessible in
"licenses": [
{
"name": null,
"title": null,
"path": null,
"instruction": null,
"attribution": null
}
],
There is no designated bibfile field for licences.
bibfile field note
is shown per default in bibliography and could be used for licence information.
I like the idea suggested by @chrwm. This would completely separate the recommended way of source management from the metadata. Of course, the current solution would still be available.
I will add concerns that might be relevant. Perhaps we decide to include your solution first and try to resolve the issues later.
One objection for me would be that we would require all users to use the Bibtex format if they want to cite sources and link them efficiently in the data. (Maybe this is not even bad)
Another point is that if we keep the current format and use the Bibtex format, we would have to handle two formats if we want to display the sources in the OEP, for example. We would have to save the Bibtex file and read from it to display the source information on the website. (This is feasible but extra work)
One objection for me would be that we would require all users to use the Bibtex format if they want to cite sources and link them efficiently in the data. (Maybe this is not even bad)
In my solution the current way of handling sources and the new key bibsources
would exist parallel. So people could use both.
I agree to the other concern.
Okay, I thought the use case you presented would add more features, but it is aimed at usability - then there is no concern :)
From today's meeting:
A link to bibSources
seems to be accepted as an extra OEM-key.
The question of whether one should be able to see the licences of the individual sources in OEMetadata in addition to the total licence of the resource (without having to look into the source) is still open.
To pick things up again - I propose to implemtent the following solution:
- add a
link
field that links to a file containing the sources, e.g. a bibfile. - move current resource-
sources
into another field, e.g.individual
"sources":{
"link": "http://url_to_bib_file_with_bib_file",
"individual":
[
{
"title": null,
"description": null,
"path": null,
"licenses": [
{
"name": null,
"title": null,
"path": null,
"instruction": null,
"attribution": null
}
]
},
{
"title": null,
"description": null,
"path": null,
"licenses": [
{
"name": null,
"title": null,
"path": null,
"instruction": null,
"attribution": null
}
]
}
]
}
- if
link
points to bibfile, motivate to save licence information in thenote
field, which is displayed by default, e.g.:
@misc{huelk2022,
author = {Hülk, Ludwig and Pleßmann, Guido and Muschner, Christoph and Kotthoff, Florian and Tepe, Deniz},
title = {open-MaStR - Marktstammdatenregister},
DOI = {10.5281/zenodo.6807426},
publisher = {Zenodo},
year = {2022},
month = {Jul},
note = {License information: Marktstammdatenregister - © Bundesnetzagentur für Elektrizität, Gas, Telekommunikation, Post und Eisenbahnen | DL-DE-BY-2.0}
}
We just discussed the current state and possible solutions and came to the conclusion we will not implement an update in the upcoming version 2.0.