metacatui icon indicating copy to clipboard operation
metacatui copied to clipboard

import embedded image metadata during file upload

Open mbjones opened this issue 1 year ago • 2 comments

Describe the feature you'd like

Hi all! This is a feature request from Olaf/Steve (and UND), for importing metadata embedded in JPG and other photographic electro-optical metadata when we upload data to the DRP. If other image files (TIFF, GeoTIFF, etc) can be supported this will be very helpful as well. Thanks!

The particular request was to 1) automatically detect and import geo-location bounding boxes/points for each image, and2) automatically detect and import image keywords.

Is your feature request related to a problem? Please describe. Uploading metadata from (tens of) thousands of images is too time consuming, but people want to see the individual file locations on the web map search.

Additional context So many bounding boxes would likely be prohibitive. We might need to redesign how spatial search works for this to be viable. I'm not sure if geohashes are adequate.

The team shared a screenshot from Lightroom showing what one investigator would like to see. I'll come back with that when I get a copy.

mbjones avatar Feb 13 '24 18:02 mbjones

My sense is that most of the image metadata will be singular lat/lon points, rather than bounding boxes, so I don't think the geohashes would be the best way to represent them. I don't have an idea for a better way of displaying them, however.

If the images do have more sophisticated geographic metadata such as a world file or other orthorectification information, it may make more sense to display them as rasters draped over the terrain.

iannesbitt avatar Feb 13 '24 23:02 iannesbitt

Point data is easily represented in geohashes. For each geohash level, what we are indicating is if the feature overlaps the geohash boundary. So, for a point, that is really just a contains query. Geohashes at level 9 (which is the max level we index in DataONE) are 4.8 m by 4.8m at the equator (see https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geohashgrid-aggregation.html#_cell_dimensions_at_the_equator) -- which is more than adequate for most of our search and display purposes. Higher-levels have sub-meter precision. Here's an example query showing our indexed geohash values:

https://cn.dataone.org/cn/v2/query/solr/?q=formatId:eml+AND+geohash_1:*&fl=identifier,formatId,geohash_1,geohash_2,geohash_3,geohash_4,geohash_5,geohash_9&wt=json

mbjones avatar Feb 16 '24 22:02 mbjones