gdal icon indicating copy to clipboard operation
gdal copied to clipboard

[gpkg] Ensure that mapping tables are inserted into gpkg_contents

Open nyalldawson opened this issue 1 year ago • 12 comments
trafficstars

Fixes auto-created relationship mapping tables cannot be read from gpkg

nyalldawson avatar Feb 20 '24 02:02 nyalldawson

Coverage Status

coverage: 68.814%. remained the same when pulling 0cf5e7d2f1bc8ac2b444d9c0b384ee18fedba176 on nyalldawson:fix_mapping_table_not_in_contents into 760ba71abd4f895e24225cdce5ce354696a31312 on OSGeo:master.

coveralls avatar Feb 20 '24 03:02 coveralls

When I look at https://docs.ogc.org/is/18-000/18-000.html "Figure 2. Related Tables UML Diagram", there's no "references" relationship from gpkg_contents to a user mapping table, which would suggest they shoudn't be referenced in gpkg_contents. And "Requirement 5 – gpkgext_relations Base Table" mentions "For each row in gpkgext_relations, there SHALL be a table or view of the name referenced in base_table_name and that table or view SHALL have an entry in gpkg_contents.", and does not mention mapping_table_name, so it looks to be on purpose

If the issue is that they aren't visible as OGR layers when opening a GPKG, perhaps that should involve other changes in the opening sequence.

rouault avatar Feb 20 '24 08:02 rouault

@rouault hmm, seems messy (but for full transparency, I think this whole extension is overly complicated! 😂)

Would you be in favour of both approaches? Ie writing the mapping table to gpkg_contents, but then also scanning for tables not in gpkg_contents but which are in gpkgext_relations? I'm concerned that other (non gdal based) readers will not have any knowledge of gpkgext_relations and effectively these tables will become unreadable in those applications. If we do both approaches then at gdal will be as tolerant as possible...

nyalldawson avatar Feb 20 '24 09:02 nyalldawson

I suggest opening a discussion issue in https://github.com/opengeospatial/geopackage/issues.

jratike80 avatar Feb 20 '24 09:02 jratike80

@jratike80 done -- https://github.com/opengeospatial/geopackage/issues/679

nyalldawson avatar Feb 20 '24 09:02 nyalldawson

I was trying to think about the drawbacks of referencing a mapping table in gpkg_contents. One that I see is that a user could potentially drop such table, if not aware it is referencing the left and right table of the relationship. But that's not much different from dropping a table that is the target of a foreign key of another, or other cross-table relationships. If we don't get a response to your question, we might experiment with doing that.

And I agree that looking for gpkgext_relations to infer user-exposed tables is going to be messy code-wise.

rouault avatar Feb 20 '24 09:02 rouault

Looks like (according to the linked ticket discussion) it IS intended that they aren't in gpkg_contents.

But my gut feeling is that we should ignore the specification here and include them in gpkg_contents (and ALSO be tolerant and allow loading tables referenced only from gpkgext_relations). I tested with ArcGIS Pro (latest version) and it CANNOT read tables referenced only in gpkgext_relations. So that means that both existing GDAL releases and ESRI software can't access the mapping tables at all if we follow the specification exactly. Which makes the whole related tables extension rather... limited :laughing:

nyalldawson avatar Feb 20 '24 23:02 nyalldawson

Actually another possible issue with adding the mapping table to gpkg_contents is that:

  • if a hypothetical software would manage the Related Table Extension according to its letter
  • and would read a GPKG created by GDAL with the mapping table put in gpkg_contents,
  • and would drop the mapping table (e.g because the left and right table are removed) it could forget to remove it from gpkg_contents.

But I'm clearly playing the devil's advocate, and I'm not that overly concerned about such potential issues

So your plan sounds good to me. Obviously we should add a comment in the code about this extension to the extension... at the place where we insert the mapping table in gpkg_contents (and perhaps a note in https://gdal.org/drivers/vector/gpkg.html#relationships for developers of other software)

rouault avatar Feb 20 '24 23:02 rouault

So that means that both existing GDAL releases and ESRI software can't access the mapping tables at all if we follow the specification exactly

if you need a workaround on QGIS side with existing GDAL releases, I presume ExecuteSQL("SELECT * FROM the_mapping_table") could do that job.

rouault avatar Feb 20 '24 23:02 rouault

Just for completeness of discussion -- if we did strictly follow the standard then there's also an annoying situation we'd have to handle where:

  1. User manually creates a mapping table using standard gdal calls, since it's just a plain old table initially it'll be added to gpkg_contents
  2. User then uses gdal API to define the relationship, setting the mapping table name explicitly in the relationship definition.

Should we then force remove the table from gpkg_contents? 🤷‍♂️

nyalldawson avatar Feb 20 '24 23:02 nyalldawson

Should we then force remove the table from gpkg_contents? 🤷‍♂️

Probably ? I dunno... Let's just put it in gpkg_contents: there's no explicit requirement preventing it from being added to it after all...

For other readers, the existing AddRelationship() handles both the cases where the mapping table doesn't exist yet (but it doesn't look the table with be immediately visible through GetLayerCount()/GetLayer() without closing & reopening the dataset. if not, perhaps that should be done?), and creates it, or where it exists and checks it has the expected structure.

rouault avatar Feb 21 '24 00:02 rouault

@rouault

So your plan sounds good to me. Obviously we should add a comment in the code about this extension to the extension... at the place where we insert the mapping table in gpkg_contents (and perhaps a note in https://gdal.org/drivers/vector/gpkg.html#relationships for developers of other software)

Ok, updated accordingly

nyalldawson avatar Feb 21 '24 01:02 nyalldawson

@nyalldawson I guess the reading part will be for another pull request? (not totally sure if we'd want to backport it)

rouault avatar Feb 21 '24 10:02 rouault

Yeah, I think it can be handled separately

nyalldawson avatar Feb 21 '24 10:02 nyalldawson