specs
specs copied to clipboard
Units and scales (and currency) in Table Schema
STATUS:
- We had a Units spec - see #35 - which then got moved out but we're moving back in see https://github.com/frictionlessdata/specs/issues/537
- We then can reference this from Table Schema - this issue here
Excellent discussion with @dr-shorthair today led me to consider importance of units and scales (and currency) in JSON Table Schema.
Suggest we could specific at MAY level:
unit: simple-string-descriptor e.g. m/s
unitSemantic: pointer-to-a-url-describing that unit - could be RDF uri
currency: # could be part of units but think probably better separate
factor: a scaling factor (e.g. 1000 would mean to scale by 1000
References
- QUDT - http://www.qudt.org/ - quantity, unit, dimensions and types -
- Unified code of units of measure - http://unitsofmeasure.org/
- Data Protocol on Units - http://dataprotocols.org/units/
Pint is a great Python library for units. http://pint.readthedocs.org/
Thinking about this through the lens of a Fiscal Data Package profile, a mapping object has been used to give semantic meaning to raw numbers in a budget dataset. As an example, currency is a type currently applied on a field by mapping a source JTS column onto a new field in a mapping object. I'm wondering: should be a general principle for applying semantic meaning to columns in a CSV or should we consider the FDP a special case.
Related:
- https://github.com/dataprotocols/dataprotocols/issues/67
@pwalsh i know - though I'm wondering if that was a good idea vs proper units. Note also we did not support "factor" ;-)
OK, I think we should introduce units and factor. Re units the question I would have is to understand any difference between QUDT and dataprotocols units spec.
@danfowler could you take a quick look at QUDT and the units spec and see if you can identify any differences.
@rgrp I can take a look.
I would suggest handling currency separate from units of measure, but in the same overall framework along with controlled vocabularies and coordinate reference systems. These are all 'reference systems'.
The special thing about currency is that conversion factors are time-dependent, and the changes are large. This does not apply to typical uom.
There is also some time-dependency in both spatial and temporal coordinate systems due to (a) moving spatial datum dues to plate tectonics - yes this does matter in applications like precision agriculture; (b) leap seconds, though in both cases most users would not notice.
@rgrp @danfowler any progress here?
@dr-shorthair great points. I'm wondering, though, if the conversion aspects you highlight are relevant for the spec itself (rather than relevant for potential applications of the spec).
Great discussion! Just wanted to chime in that I think this would be helpful for CSV columns as well :)
@rgrp do you want to move forward on this?
Would that look something like the following?
"schema": {
"fields": [
{
"name": "Year",
"description": "Year",
"type": "date"
},
{
"name": "Total",
"description": "Total carbon emissions from fossil fuel consumption and cement production (million metric tons of C)",
"type": "number",
"unit": "Mt",
"unitSystem": "SI"
}
]
[…]
@rgieseke yes - that is correct. Your unitSystem is an addition by you I assume? And is unit a reference to the dataprotocols units spect or a different one?
@pwalsh next steps here would be:
- Deciding what exactly to add. e.g.
factorandunit - Deciding where to add it - I'm thinking this is more of a pattern or extension rather than core ...
@rgrp Yes, sorry I mis-remembered units and unitsSemantic. Why would it be units as plural though?
@rgieseke units was a typo which I have corrected - should be unit.
We are planning to use table schema for describing the inner structure our resources. But we definitely need to store the unit of measurement. Thus, we would very much welcome if the table schema spec would support it and we wouldn't have to work with custom addons.
@muehlenpfordt et al at Open Power System Data seem to have produced Data Packages with a unit: attribute at the field level with a string value (e.g. "MW"). I'd be curious to learn if that what use case that supports in that project.
https://github.com/Open-Power-System-Data/renewable_power_plants/blob/master/validation_and_output.ipynb
I also went with unit for each table column
https://github.com/openclimatedata/global-carbon-budget/blob/master/datapackage.json#L59
I think the main use case is to easily read in a data set and apply a unit transformation, e.g. for comparison with another dataset.
For us, the use case is to clarify the unit of measurement, i.e. whether the numbers in the columns should be read as Megawatts MW, Kilowatts kW, Megawatthours MWh, etc. For the case of currencies, we will put EURO or DKK etc.
@muehlenpfordt how do you indicate a currency unit? Do you have a specific prefix or ...?
I haven't implemented it yet, but I had thought putting "unit": "EUR" (using ISO 4217 currency code) would make it sufficiently clear that the column contains currency data. Now I saw there was the suggestion to have an additional attribute currency. Would that mean I put
"unit": "currency",
"currency": "EUR"
?
That seems a bit redundant. On the other hand it might help considering the amount of different currencies.
What do you think? I am open for you suggestions.
How about something like
"unit": "EUR",
"unitSystem": "ISO-4217"
@rgieseke really like that approach.
I think this is mature enough to become a pattern.
I also think we may want to move the units draft spec http://specs.okfnlabs.org/units/ back to FD specs /cc @danfowler @pwalsh - now #537
We have a use case for this in biotracks, see https://github.com/CellMigStandOrg/biotracks/issues/9
I have a question. Will there be any specified way of converting measurements from one unit to another? Say celsius to kelvin or fahrenheit. Or is this outside the scope of the spec?
@Kenji-K this would be outside of the spec - it would be something a tool would implement (but the spec could form the basis for that tool's API)
I also think we may want to move the units draft spec http://specs.okfnlabs.org/units/ back to FD specs /cc @danfowler @pwalsh - now #537
@pwalsh @roll
Hey, is there still interest in this feature ? We (for French administration) would use it (basically, tools consuming table schema would infer some behavior according to the unit, when defined). Any way we could help it land in the spec ?
Yes, also interested, to use it for Camtrap DP. Although one can of course expand the Frictionless Table Schema as they want (e.g. adding a unit property for each field) I’d also rather have this as part of the core Table Schema itself.
@yohanboniface yes a lot of interest. First start would be a detailed pattern. Note @Stephen-Gates had a go at that in https://github.com/frictionlessdata/specs/pull/607 - we are really open to getting a pattern and then turning that into part of the spec.