Trail-Sense icon indicating copy to clipboard operation
Trail-Sense copied to clipboard

USGS Map Locator PDFs not importing as geospatial

Open kylecorry31 opened this issue 1 year ago • 8 comments

Investigate why these PDFs are not importing as geospatial and update the PDF parser to support this format: USGS Map Locator, such as Block Island

kylecorry31 avatar Oct 08 '23 20:10 kylecorry31

After inspecting the PDF, this does not appear to be a bug in Trail Sense. The PDFs do not have geospatial information, so this may be working as intended.

I want to do one more check before I close this out in case there's a different geospatial format being used that is non-standard.

kylecorry31 avatar Oct 14 '23 00:10 kylecorry31

https://www.usgs.gov/faqs/what-geopdfr

https://www.usgs.gov/faqs/what-geomark-and-do-usgs-topographic-maps-have-it

kylecorry31 avatar Oct 14 '23 11:10 kylecorry31

If this is using a different geospatial format, it must be the Geomark format because I can't see anything for a geospatial PDF.

kylecorry31 avatar Oct 14 '23 13:10 kylecorry31

What an interesting rabbit hole this led me down. That Block Island PDF is from 2021, so it should use this new mystery format. Annoying that USGS just says "an Open Source GeoPDF format" (misusing TerraGo's trademark instead of using the generic "geospatial PDF") without being more specific or providing a link.

The nearby Kingston map might be a better example, because it is available in both 2015 and 2018 versions, so presumably provides a sample of both formats.

Wikipedia says a Geospatial PDF (the generic) "is most commonly encoded in one of two ways: the OGC best practice; and as Adobe's proposed geospatial extensions to ISO 32000." About GeoPDF (the trademark), it says "GeoPDF products conform to published specifications including both the OGC best practice for PDF georegistration as well as Adobe's proposed geospatial extensions to ISO 32000".

The Library of Congress says of GeoPDF that it is proprietary and undocumented but that TerraGo claims the files they generate "comply with" either OGC Best Practice or Adobe.

So TerraGo is somehow both different and special, but also the same? Strange.

The wiki mentions GDAL as one software that supports Geospatial PDF. The GDAL PDF driver page says, "The neatline (for OGC best practice) or the bounding box (Adobe style) will be reported as a NEATLINE metadata item, so that it can be later used as a cutline for the warping algorithm."

I tried running gdalinfo on the Block Island file and both Kingston files, and it was able to pull coordinate data from all three. So that might be a place to look for info about how to read the data.

diggernet avatar Oct 15 '23 16:10 diggernet

Thank you for all the details, I'll check out those links. I'm going to treat this issue as a new geospatial PDF format. It is definitely not ISO 32000 - I extracted all the readable metadata from the document and didn't see anything I recognized, nor could find any coordinates - might be compressed within the streams on the PDF (the LOC OGC best practice document mentions that compression will likely be in place).

"Judging from example files, this georegistration encoding can usually be read by using an ASCII text editor to open the PDF file. In practice, most PDFs have compression filters applied to most of the file content."

But it also mentions: "A PDF file including PDF georegistration 2.2 encoding will be recognized by the existence of an LGIDict entry associated with at least one page in the PDF." - which I can't find in the newer PDFs (but can find in the older kingston)

kylecorry31 avatar Oct 15 '23 17:10 kylecorry31

Ok, looking closer, I see that Block Island 2021 is explicitly described as "GEOSPATIAL PDF", as is Kingston 2018. Kingston 2015 is unspecified. But here is Block Island 1998, which is explicitly described as "GEOPDF". Gdalinfo is NOT able to read any data from Block Island 1998, so Kingston 2015 must also be Geospatial PDF. Though the output for Kingston 2015 and Kingston 2018 has some structural differences, so maybe those represent the OGC vs Adobe variants? To be clear, I don't care about TerraGo/GeoPDF/GeoMark files. This request is for the current Geospatial PDF files.

diggernet avatar Oct 15 '23 20:10 diggernet

Unfortunately none of the PDFs (Kingston or Block Island) follow the ISO 32000 geospatial format

I see some are likely to be OGC (the older ones) - but I'm still not sure what format the newer PDFs are, since they are not ISO 32000. So I think either way, this would be adding support for a new format (OGC and whatever USGS is calling "geospatial PDF")

kylecorry31 avatar Oct 15 '23 21:10 kylecorry31

What a pain. It's a shame that USGS doesn't document better what they are doing. But since the real goal here is USGS maps, not necessarily their PDFs specifically, looks like there could be an easier solution. I just discovered that at topoView they provide the downloads in several other formats:

  • GeoTIFF: Topo map and orthophoto TIFF images, plus some other data files. GDAL can pull the data, but after learning about GeoPDF I'm scared to ask what GeoTIFF actually is.
  • JPG: Topo map and orthophoto JPG images. I don't think these are actually georeferenced.
  • KMZ: KMZ files containing a topo map or orthophoto JPG, plus a very simple KML file positioning the image as an overlay.

I think the KMZ is probably the simplest of these. It is basically just a JPG with some self-documenting georeferencing, as you see in this excerpt:

<GroundOverlay> <name>RI_Block_Island_20210309_TM_kmz</name> <Icon> <href>RI_Block_Island_20210309_TM_kmz.jpg</href> </Icon> <LatLonBox> <north>41.25000844558588398</north> <south>41.12500841692346398</south> <east>-71.50000066458208892</east> <west>-71.62500071175118421</west> </LatLonBox> </GroundOverlay>

Of course, that's just another way of saying "adding support for a new format", so that's where this gets filed no matter which approach you take (or don't take - it's your time to spend as you wish).

diggernet avatar Oct 16 '23 03:10 diggernet