specs icon indicating copy to clipboard operation
specs copied to clipboard

Units and scales (and currency) in Table Schema

Open rufuspollock opened this issue 10 years ago • 31 comments
trafficstars

STATUS:

  • We had a Units spec - see #35 - which then got moved out but we're moving back in see https://github.com/frictionlessdata/specs/issues/537
  • We then can reference this from Table Schema - this issue here

Excellent discussion with @dr-shorthair today led me to consider importance of units and scales (and currency) in JSON Table Schema.

Suggest we could specific at MAY level:

unit: simple-string-descriptor e.g. m/s
unitSemantic: pointer-to-a-url-describing that unit - could be RDF uri
currency:      # could be part of units but think probably better separate
factor: a scaling factor (e.g. 1000 would mean to scale by 1000

References

  • QUDT - http://www.qudt.org/ - quantity, unit, dimensions and types -
  • Unified code of units of measure - http://unitsofmeasure.org/
  • Data Protocol on Units - http://dataprotocols.org/units/

rufuspollock avatar Sep 24 '15 14:09 rufuspollock

Pint is a great Python library for units. http://pint.readthedocs.org/

s-celles avatar Sep 24 '15 14:09 s-celles

Thinking about this through the lens of a Fiscal Data Package profile, a mapping object has been used to give semantic meaning to raw numbers in a budget dataset. As an example, currency is a type currently applied on a field by mapping a source JTS column onto a new field in a mapping object. I'm wondering: should be a general principle for applying semantic meaning to columns in a CSV or should we consider the FDP a special case.

Related:

  • https://github.com/dataprotocols/dataprotocols/issues/67

danfowler avatar Dec 01 '15 08:12 danfowler

@danfowler JTS already supports currency as a format on number:

pwalsh avatar Dec 01 '15 08:12 pwalsh

@pwalsh i know - though I'm wondering if that was a good idea vs proper units. Note also we did not support "factor" ;-)

rufuspollock avatar Dec 01 '15 09:12 rufuspollock

OK, I think we should introduce units and factor. Re units the question I would have is to understand any difference between QUDT and dataprotocols units spec.

@danfowler could you take a quick look at QUDT and the units spec and see if you can identify any differences.

rufuspollock avatar Feb 25 '16 09:02 rufuspollock

@rgrp I can take a look.

danfowler avatar Feb 25 '16 11:02 danfowler

I would suggest handling currency separate from units of measure, but in the same overall framework along with controlled vocabularies and coordinate reference systems. These are all 'reference systems'.

The special thing about currency is that conversion factors are time-dependent, and the changes are large. This does not apply to typical uom.

There is also some time-dependency in both spatial and temporal coordinate systems due to (a) moving spatial datum dues to plate tectonics - yes this does matter in applications like precision agriculture; (b) leap seconds, though in both cases most users would not notice.

dr-shorthair avatar Feb 25 '16 21:02 dr-shorthair

@rgrp @danfowler any progress here?

@dr-shorthair great points. I'm wondering, though, if the conversion aspects you highlight are relevant for the spec itself (rather than relevant for potential applications of the spec).

pwalsh avatar Mar 07 '16 06:03 pwalsh

Great discussion! Just wanted to chime in that I think this would be helpful for CSV columns as well :)

patcon avatar May 26 '16 05:05 patcon

@rgrp do you want to move forward on this?

pwalsh avatar Jul 12 '16 09:07 pwalsh

Would that look something like the following?

"schema": {
  "fields": [
    {
      "name": "Year",
      "description": "Year",
      "type": "date"
    },
    {
      "name": "Total",
      "description": "Total carbon emissions from fossil fuel consumption and cement production (million metric tons of C)",
      "type": "number",
      "unit": "Mt",
      "unitSystem": "SI"
    }
  ]

[…]

rgieseke avatar Jul 28 '16 21:07 rgieseke

@rgieseke yes - that is correct. Your unitSystem is an addition by you I assume? And is unit a reference to the dataprotocols units spect or a different one?

rufuspollock avatar Aug 09 '16 08:08 rufuspollock

@pwalsh next steps here would be:

  • Deciding what exactly to add. e.g. factor and unit
  • Deciding where to add it - I'm thinking this is more of a pattern or extension rather than core ...

rufuspollock avatar Aug 09 '16 08:08 rufuspollock

@rgrp Yes, sorry I mis-remembered units and unitsSemantic. Why would it be units as plural though?

rgieseke avatar Aug 09 '16 09:08 rgieseke

@rgieseke units was a typo which I have corrected - should be unit.

rufuspollock avatar Aug 11 '16 13:08 rufuspollock

We are planning to use table schema for describing the inner structure our resources. But we definitely need to store the unit of measurement. Thus, we would very much welcome if the table schema spec would support it and we wouldn't have to work with custom addons.

rnuske avatar May 04 '17 13:05 rnuske

@muehlenpfordt et al at Open Power System Data seem to have produced Data Packages with a unit: attribute at the field level with a string value (e.g. "MW"). I'd be curious to learn if that what use case that supports in that project.

https://github.com/Open-Power-System-Data/renewable_power_plants/blob/master/validation_and_output.ipynb

danfowler avatar Jun 08 '17 07:06 danfowler

I also went with unit for each table column

https://github.com/openclimatedata/global-carbon-budget/blob/master/datapackage.json#L59

I think the main use case is to easily read in a data set and apply a unit transformation, e.g. for comparison with another dataset.

rgieseke avatar Jun 08 '17 08:06 rgieseke

For us, the use case is to clarify the unit of measurement, i.e. whether the numbers in the columns should be read as Megawatts MW, Kilowatts kW, Megawatthours MWh, etc. For the case of currencies, we will put EURO or DKK etc.

jgmill avatar Jun 08 '17 09:06 jgmill

@muehlenpfordt how do you indicate a currency unit? Do you have a specific prefix or ...?

rufuspollock avatar Jun 09 '17 10:06 rufuspollock

I haven't implemented it yet, but I had thought putting "unit": "EUR" (using ISO 4217 currency code) would make it sufficiently clear that the column contains currency data. Now I saw there was the suggestion to have an additional attribute currency. Would that mean I put

"unit": "currency",
"currency": "EUR"

?

That seems a bit redundant. On the other hand it might help considering the amount of different currencies.

What do you think? I am open for you suggestions.

jgmill avatar Jun 12 '17 14:06 jgmill

How about something like

"unit": "EUR",
"unitSystem": "ISO-4217"

rgieseke avatar Jun 12 '17 15:06 rgieseke

@rgieseke really like that approach.

I think this is mature enough to become a pattern.

I also think we may want to move the units draft spec http://specs.okfnlabs.org/units/ back to FD specs /cc @danfowler @pwalsh - now #537

rufuspollock avatar Jun 14 '17 08:06 rufuspollock

We have a use case for this in biotracks, see https://github.com/CellMigStandOrg/biotracks/issues/9

simleo avatar Jun 29 '17 16:06 simleo

I have a question. Will there be any specified way of converting measurements from one unit to another? Say celsius to kelvin or fahrenheit. Or is this outside the scope of the spec?

Kenji-K avatar Oct 25 '17 02:10 Kenji-K

@Kenji-K this would be outside of the spec - it would be something a tool would implement (but the spec could form the basis for that tool's API)

rufuspollock avatar Nov 04 '17 10:11 rufuspollock

I also think we may want to move the units draft spec http://specs.okfnlabs.org/units/ back to FD specs /cc @danfowler @pwalsh - now #537

@pwalsh @roll

rufuspollock avatar Nov 04 '17 10:11 rufuspollock

Hey, is there still interest in this feature ? We (for French administration) would use it (basically, tools consuming table schema would infer some behavior according to the unit, when defined). Any way we could help it land in the spec ?

yohanboniface avatar Dec 19 '22 18:12 yohanboniface

Yes, also interested, to use it for Camtrap DP. Although one can of course expand the Frictionless Table Schema as they want (e.g. adding a unit property for each field) I’d also rather have this as part of the core Table Schema itself.

peterdesmet avatar Dec 19 '22 19:12 peterdesmet

@yohanboniface yes a lot of interest. First start would be a detailed pattern. Note @Stephen-Gates had a go at that in https://github.com/frictionlessdata/specs/pull/607 - we are really open to getting a pattern and then turning that into part of the spec.

rufuspollock avatar Dec 20 '22 09:12 rufuspollock