geopackage icon indicating copy to clipboard operation
geopackage copied to clipboard

Zipped GeoPackage files and media type

Open heidivanparys opened this issue 4 years ago • 11 comments

Is it common to zip GeoPackage files? If yes, should a media type, application/geopackage+sqlite3+zip, be registered for that at IANA?

In that way, an API conforming to OGC API - Features and the (draft) INSPIRE good practice building on OGC API - Features could link to such a zipped GeoPackage file using that media type.

"links": [
  { ... },
  { "href": "https://download.my-org.eu/buildings.zip",
    "rel": "enclosure",
    "type": "application/geopackage+sqlite3+zip",
    "title": "Download the dataset as a GeoPackage (CRS: EPSG:25832)",
    "length": 472546 },
  { ... }
  ],

See also

heidivanparys avatar Nov 17 '20 13:11 heidivanparys

@heidivanparys Since HTTP already supports Content-Encoding as a mechanism to exchange the data compressed, would there still be enough value in a dedicated application/geopackage+sqlite3+zip media type? The same +zip question kind of applies to all formats that an OGC API might deliver.

Maybe there is still value for very large file so that they can get saved directly with the .zip extension, and to save the server from having to compress it on the fly in some cases?

jerstlouis avatar Feb 02 '21 17:02 jerstlouis

@heidivanparys any thoughts on @jerstlouis 's comment?

jyutzler avatar Mar 02 '21 16:03 jyutzler

I have mixed thoughts on this. It makes sense because lots of +zip media types are registered at IANA but, at the same time, application/vnd.sqlite3+zip is not registered. Is it common to distribute sqlite3 files compressed?

fjlopez avatar Mar 02 '21 21:03 fjlopez

Since HTTP already supports Content-Encoding as a mechanism to exchange the data compressed, would there still be enough value in a dedicated application/geopackage+sqlite3+zip media type? The same +zip question kind of applies to all formats that an OGC API might deliver.

Maybe there is still value for very large file so that they can get saved directly with the .zip extension, and to save the server from having to compress it on the fly in some cases?

Is it common to distribute sqlite3 files compressed?

@jerstlouis @fjlopez I don't know what is common practice, but I can describe the practice at the agency where I work. One of our distribution channels is the Danish Map Supply. One of the ways you can get data from the Danish Map Supply is by downloading a dataset or a predefined subset of a dataset from the Map Supply's FTP server.

A host of predefined sections of data sets are readily available for download. These are both sections of historical data sets and sections from updated data sets that are updated regularly to reflect the newest available data. E.g. the matricular maps are updated every two months.

The FTP server stores (subsets of) datasets in different format. I had a look again, and almost all files are zipped. So the shapefiles, GML files, MapInfo files, etc. are compressed and then put on the FTP server, from where users can retrieve those zip files.

Links to those zip files, and information about their media types, are e.g. present in the Atom feeds we have as well, see e.g. https://download.kortforsyningen.dk/sites/default/files/feeds/NamedPlace.xml:

<entry xml:lang="da">
    <title>DK INSPIRE NamedPlace</title>
    <!-- ... -->
    <link
        rel="alternate"
        href="ftp://ftp.kortforsyningen.dk/atomfeeds/INSPIRE/GML/EPSG_3044/DK_NamedPlace.gml.gz"
        type="application/x-gmz"
        length="109479325"
        title="DK INSPIRE NamedPlace"
        hreflang="da"/>
    <!-- ... -->
    <id>ftp://ftp.kortforsyningen.dk/atomfeeds/INSPIRE/GML/EPSG_3044/DK_NamedPlace.gml.gz</id>
    <!-- ... -->
  </entry>

(Media type application/x-gmz is described on https://inspire.ec.europa.eu/media-types/application/x-gmz).

I have mixed thoughts on this. It makes sense because lots of +zip media types are registered at IANA but, at the same time, application/vnd.sqlite3+zip is not registered. Is it common to distribute sqlite3 files compressed?

I am not convinced that we can conclude that it is not common to distribute sqlite3 files compressed just because application/vnd.sqlite3+zip is not registered. Another explanation could be that nobody cared to register application/vnd.sqlite3+zip because there is no need to comply with a certain specification or best practice.

heidivanparys avatar Mar 11 '21 08:03 heidivanparys

IMHO, The discussion on the distribution of GeoPackage as compressed files and the need for the registry of an IANA media type for such case should not be mixed:

  • I agree that GeoPackage files can be distributed zipped with a proper name(i.e. name.gpkg.zip).
  • I think that there is no need to register a specific media type because RFC 6839 3.6. The +zip structured syntax suffix defines when and how to use of +zip and hence application/geopackage+sqlite3+zip is OK as application/geopackage+sqlite3 is already registered.

RFC 6839 may explain why application/vnd.sqlite3+zip has not been registered.

fjlopez avatar Mar 11 '21 10:03 fjlopez

I think that there is no need to register a specific media type because RFC 6839 3.6. The +zip structured syntax suffix defines when and how to use of +zip and hence application/geopackage+sqlite3+zip is OK as application/geopackage+sqlite3 is already registered.

Earlier, I made the same assumption. However, in another, similar discussion, on zipped GeoJSon files (the relevant part starting here), @cportele wrote the following in this comment:

[...] In my understanding 6839 states rules for media types with a suffix like "+zip". It does not say a suffix "+zip" may be added to any existing media type. Something like application/geo+json+zip would not be a valid media type. It would still need to be registered with IANA. [...]

heidivanparys avatar Mar 11 '21 11:03 heidivanparys

I agree with you, my assumption was wrong. See this excerpt from RFC 6898 Media Type Specifications and Registration Procedures.

Media types that make use of a named structured syntax SHOULD use the appropriate registered "+suffix" for that structured syntax when they are registered.

Reviewing the IANA registry of structured suffixes +gzip is also registered. But it makes sense to register only application/geopackager+sqlite3+zip due to the popularity and availability of the ZIP format.

fjlopez avatar Mar 11 '21 19:03 fjlopez

So the consensus is to ask IANA to register application/geopackage+sqlite3+zip? I just want to be sure before I move forward.

jyutzler avatar Mar 11 '21 19:03 jyutzler

Also please keep in mind that even if the encoding is application/geopackage+sqlite3, it is still possible for the data to be compressed zipped with Accept-encoding, and unless the visualization client directly supports zipped GeoPackage, this avoids an extra step / duplication of the data compared to having to extract it.

jerstlouis avatar Mar 11 '21 19:03 jerstlouis

@jerstlouis there are scenarios where having +zip is needed. For example, we can have links to GeoPackages in an Atom file that point to:

  • HTTP servers with content negotiation enabled. The link type can be application/geopackage+sqlite3 and the user agent may negotiate if the server sends the GeoPackage file compressed or not. ✔️
  • FTP servers or HTTP servers without content negotiation enabled. Here we have three cases:
    • If the file served is not compressed, we must use application/geopackage+sqlite3. ✔️
    • If the file served is compressed in ZIP format and the link type is application/geopackage+sqlite3, the user agent may think that the GeoPackage file is broken ❌ or it must have a method to sniff the mime type. 🤞
    • If the file served is compressed in ZIP format and the link type is application/geopackage+sqlite3+zip, the user agent will uncompress it and then use the GeoPackage. ✔️

fjlopez avatar Mar 11 '21 20:03 fjlopez

Assigning to @ogcscotts to contact IANA.

jyutzler avatar Apr 14 '21 14:04 jyutzler