specs
specs copied to clipboard
Support units in Table Schema
Move the units draft spec http://specs.okfnlabs.org/units/ back to FD specs
/cc @pwalsh
@rufuspollock before we move it here, it would be good to discuss how it is to be implemented. Is the proposal to implement this spec, as is, as part of Table Schema?
@pwalsh open to suggestions. I was thinking of keeping it separate but adding support for referencing it from table schema but not sure what is best.
I've drafted a pattern https://github.com/frictionlessdata/specs/pull/607 and started a discussion on the forum https://discuss.okfn.org/t/table-schema-units-pattern/6573
Notes from thread in the PR #607
@Stephen-Gates wrote:
@rufuspollock I put this up to stimulate conversation. I guess we need to resolve if we adopt the suggestion on the forum about using an existing specification UCUM, ISO 80000, BIPM.
@rufuspollock wrote back
@Stephen-Gates have you looked at any of those specs in detail? UCUM seems pretty heavy duty - is there a subset of that we could adopt / summarize?
We really just want something relatively simple that fits the 80/20 rule ...
Re ISO 80000 - is that open? If not that would be an issue for us ...
The BIPM stuff looks good though very physically oriented and fundamental.
/cc @dr-shorthair
@dr-shorthair wrote
UCUM may look heavy duty (though I don't think it is, really). But it provides an approach to build any unit-of-measure-symbol from the atomic elements.
This gets over the problem that any static set will come up short. A static list might look like 80-20, but what do you say to the people that need one of the 20 (which is still a lot of applications!).
@Stephen-Gates (/cc @dr-shorthair) I think our aim is to see if we can extract a subset of UCUM that gives us 80/20 and then say if you want more go to UCUM.
I have to say i think this could / should go in 2 stages:
- Move the spec back here (as it is - even if flawed)
- Then start upgrading it ... (this can be a new issue ...)
This keeps things clean. wdyt? And if so would that mean we could merge #607 as it is?
@Stephen-Gates any luck here on progressing this?
Sorry been focussed a new release of Data Curator. Haven’t forgotten
Hi @Stephen-Gates (and also @rufuspollock + @pwalsh)! I wanted to let you know that @mbomhoff from Planet Microbe has been working with data packages for their oceanographic data and has been thinking about what units specs would work best for them. I wanted to tag Matt so he can keep updated on the specs units conversation, and also intro y'all in case you want to connect and discuss what units ideas Matt has. Thanks both 😄
@lwinfree Thanks Lilly! Our data packages are in https://github.com/hurwitzlab/planet-microbe-datapackages. For the time being we added a custom property unitRdfType
.
@mbomhoff are you ok with the direction the draft PR was taking if we address the comments above?
Item 2. in the license is problematic:
Users shall not modify the Licensed Materials and may not distribute modified versions of the UCUM table (regardless of format) or UCUM Specification. Users shall not modify any existing contents, fields, description, or comments of the Licensed Materials, and may not add any new contents to it.
Unfortunately UCUM now appears to be an infrastructure orphan - I've not been able to make contact with Guenther Schadow for a couple of years now. Possibly retired. I'll try again.
@Stephen-Gates It looks like the draft spec is capable of describing all of the units that we use in our project, but I think our application falls under the case of using an existing spec. One of the goals of our project is to use ontologies to unify disparate datasets from various sources. To describe a field we supply an Environment Ontology (ENVO, http://environmentontology.org/) purl in the rdfType
property and a Unit Ontology (UO, https://github.com/bio-ontology-research-group/unit-ontology) purl in a custom unitRdfType
property. If the proposal is adopted we would add the unit
property for consistency but most likely keep our existing rdfType
and unitRdfType
properties. For example, to describe the "depth" field we would use:
rdfType: "http://purl.obolibrary.org/obo/ENVO_3100031",
unitRdfType: "http://www.ontobee.org/ontology/UO?iri=http://purl.obolibrary.org/obo/UO_0000008",
unit: "m"
For us the UO purl provides stronger semantics and some additional info such as aliases (meter, metre) and a text description.
@Stephen-Gates any chance to look at this further. It sounds like we have to steer around UCUM atm.
UCUM only provides the terminal symbols, and a grammar to combine them into any UoM. So it is a mistake to talk about 80:20 provide by UCUM with respect to some finite set. UCUM probably provides 95%+, but by utilising the grammar.
Meanwhile, I have now tracked down the owner of UCUM so it's not dead yet.
Lilly Winfree directed me to this discussion after I asked her how you handle units in your project in PyData Asutin. We have been dealing with Unit standardization for over a year and can connect you to some of unit specs - at least in the medical domain. However, you will see that we did a lot of the work already in the following project: https://clinicalunitmapping.com/ In the About page, you will see many publications that describe details. Currently we already bundle 4 unit specs/standards and have machine learning tools developed to address those - it is easy to add more specs. If you are interested in joining forces, please contact me by email outside this Github thread to see how it is possible. The project is currently not fully open, yet there is a desire by some stakeholders to keep some of its elements open. So I will appreciate offline communication before returning to this open thread and continuing public discussion. In any case, please note that some open standards contain UCUM, so your fear of the UCUM license was already handled by others successfully, so it may be possible to resolve this issue with some effort. Yet I need to see your issues first and discuss how this can be resolved offline. Looking forward for more communications.
@Jacob-Barhak this is great info - if you could share your experience and links that would help esp any key pointers. Your tip re UCUM is also very helpful. We will look at https://clinicalunitmapping.com/
So @rufuspollock , all documentation associated with the project is available in the about page: https://clinicalunitmapping.com/about
You will find many publications already and some presentations, yet no important code is currently open source. If your problem is small, then you should have enough pointers there to resolve your unit issues - yet if you are interested in a global solution for the units problem, we should schedule a video chat and talk.
Again, despite expressed desire by many stakeholders to make it an open source project, it is currently not fully available to the public. If you have ideas on how to make it available, I am open to suggestions.
If adding support to units was done: frictionless-py could output the values with its units (maybe optionally) using https://pint.readthedocs.io/en/stable/ Obviously if the units used in the spec were available in the Pint library.