mixs icon indicating copy to clipboard operation
mixs copied to clipboard

Add information about the precision to lat_lon

Open wdduncan opened this issue 5 years ago • 8 comments

We need to capture the precision of the lat_lon. (1 decimal, 2 decimal, etc.)

cc @cmungall @ramonawalls @lschriml

wdduncan avatar Nov 11 '20 21:11 wdduncan

Darwin core has: http://rs.tdwg.org/dwc/terms/coordinatePrecision plus alot of other terms which describe location certainty: https://dwc.tdwg.org/terms/

John

On Wed, Nov 11, 2020 at 1:51 PM Bill Duncan [email protected] wrote:

We need to capture the precision of the lat_lon. (1 decimal, 2 decimal, etc.) Need to be clear which format is required.

cc @cmungall https://github.com/cmungall

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GenomicsStandardsConsortium/mixs/issues/89, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIZ3RKNA4BNJDXRXNBIVHTSPMBPZANCNFSM4TSQD2ZQ .

-- John Deck (541) 914-4739

jdeck88 avatar Nov 11 '20 22:11 jdeck88

Or capture the horizontal accuracy as a separate field. The current MixS example uses a float of 6 decimal digits which is a precision only attainable with a highly accurate GPS. Most phones are accurate to 4 decimal places which is ~11 meters. Also, it would be good to consider working with NCBI Biosample to be consist on format. From BioSample documentation for lat long:

"The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format "d[d.dddd] N|S d[dd.dddd] W|E", eg, 38.98 N 77.11 W"

vs mixS

"The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees and in WGS84 system"

which means the user has to have two different formats depending on whether they submit using a MixS template or the biosamples template.

StantonMartin avatar Nov 11 '20 22:11 StantonMartin

For MIxSv6.0 -- add the recommendation for decimal degrees, we can adopt the darwin core latitude and longitude. Recommendation: to at least 4 decimals, ideally 5 -INSDC can add to the recommendation

  • Do we want another field that addresses the specificity/accuracy of the device ?

Darwin Core: coordinatePrecision Property
Identifier http://rs.tdwg.org/dwc/terms/coordinatePrecision
Definition A decimal representation of the precision of the coordinates given in the decimalLatitude and decimalLongitude.
Comments 
Examples 0.00001 (normal GPS limit for decimal degrees). 0.000278 (nearest second). 0.01667 (nearest minute). 1.0 (nearest degree).


lschriml avatar Nov 23 '20 16:11 lschriml

+1 for splitting lat/lon and changing the guidance to have at least 4 decimals (see table on this page for primer).

We should be explicit that even if there are zeroes after the decimal point, they should be noted explicitly (e.g. 74.0000)

I can also see good reason to add a field for the accuracy of the positioning device too (in meters)

pbuttigieg avatar Nov 23 '20 16:11 pbuttigieg

Not sure if this is helpful, but there is also a DwC term coordinateUncertaintyInMeters for uncertainty on top of the already mentioned coordinatePrecision for precision.

coordinateUncertaintyInMeters

Identifier http://rs.tdwg.org/dwc/terms/coordinateUncertaintyInMeters
Definition The horizontal distance (in meters) from the given decimalLatitude and decimalLongitude describing the smallest circle containing the whole of the Location. Leave the value empty if the uncertainty is unknown, cannot be estimated, or is not applicable (because there are no coordinates). Zero is not a valid value for this term.
Examples 30 (reasonable lower limit of a GPS reading under good conditions if the actual precision was not recorded at the time). 71 (uncertainty for a UTM coordinate having 100 meter precision and a known spatial reference system).

coordinatePrecision

Identifier http://rs.tdwg.org/dwc/terms/coordinatePrecision
Definition A decimal representation of the precision of the coordinates given in the decimalLatitude and decimalLongitude.
Examples 0.00001 (normal GPS limit for decimal degrees). 0.000278 (nearest second). 0.01667 (nearest minute). 1.0 (nearest degree).

For more consideration of location information, thought this best practice maybe a good reference.

ymgan avatar Nov 25 '20 10:11 ymgan

also +1 for splitting lat_lon (real pain in Excell when working with negative values) and adding terms for coordinatePrecision and coordinateUncertaintyInMeters. Maybe the definition of the coordinate fields should explain the meaning of the precision and the reason for the recommendations, because most scientist are not fully aware of that. This is especially the case when values were converted from degrees-minutes to decimals (e.g. 20°10' = 20.1666666667, which goes from a precision of several tens of meters to nearly a single cell...)

Best not to replace the current recommendation of reporting coordinates in decimal degrees and WGS84 (i.e. numeric values) with the INSDC format (which is basically changing a numeric measurement to text)

msweetlove avatar Nov 27 '20 10:11 msweetlove

Following discussion during an earlier CIG call, splitting will cause many issues for end users, decision was to not split.

lschriml avatar Mar 04 '21 16:03 lschriml

Here is the current definition of lat_lon (not just the description)

Note that annotations are a semantically weak way of adding clarification to a term/slot definition.

So guidance about the maximum number of decimal places is provided, but not the minimum. If that's desired, we could add a regular expression constraint. And I'm guessing that the idea of providing additional slots for uncertainty was abandoned, but never recorded?

I have a suggestion for future discussions like this: if a a decision has been made about an issue, but in a live meeting, some record of that meeting (like examples of who would experience what issues) should be linked.

turbomam avatar Oct 19 '23 00:10 turbomam