specification icon indicating copy to clipboard operation
specification copied to clipboard

Suggested improvement to key relationships for phone, contact and address entities

Open rasmus-storjohann-PG opened this issue 6 years ago • 2 comments

The phone and contact entities are child records of several different kinds of parent entities. This is implemented by these entities having multiple foreign keys, most of which will be NULL.

Presumably the assumption is that phone and contact entities will each have exactly one parent entity. However, the schema can express several other situations:

  • A child record can have no parent, which I guess would always be invalid, but this cannot easily be enforced in the database.
  • A child record can have multiple parents of different entity types. Maybe that could be valid, it's unclear and could lead to confusion and divergence among implementations. We will need validation logic outside the database to enforce this, leading to possible incompatibilities between different implementations.
  • In contrast, the key scheme does not permit multiple parents of the same entity type.

Relations that have this problem include the phone -> (parent location, service, organization, contact, service_at_location), contact -> (parent organization, service, service_at_location).

A better way to implement the same relationships is to have many2many relationships from phone and contact entities to the parents of each type, i.e. location_phone, service_phone, organization_phone, etc. It's true that we'd get a lot of spanning tables, especially phone_at_service_at_location is getting a bit complex, but the data would be much easier to keep valid. It will also be easier to write code against these tables, since we'd get away from having so many foreign keys that may be NULL. Every time a foreign key is NOT NULL, that's a dozen fewer times I have to check against NULL in my code.

Postal and physical addresses entities have a similar problem. There is a lot of duplication between the two address entities. I think a better way to implement the two kinds of address is to use this solution. This would mean that there would just be one table of addresses and the many2many relationship between locations and addresses would have an attribute indicating whether it's a postal or physical address. A UNIQUE constraint on the location_id and address_type fields on the spanning table could be used to ensure that each location has at most one postal address and at most one physical address.

rasmus-storjohann-PG avatar Jan 19 '18 19:01 rasmus-storjohann-PG

@rasmus-storjohann-PG Thanks for your comments here (and on other issues).

This is definitely a major issue in the current data structure, and one inherited from the first version of HSDS. In http://docs.openreferral.org/en/latest/hsds/logical_model/ we state that "When a single row contains multiple foreign keys, these will be interpreted as 'OR' relationships." but in the data package spec we don't have a way to express this in machine-readable form.

I also recognise it hits the problem you note that "A child record can have multiple parents of different entity types. Maybe that could be valid, it's unclear and could lead to confusion and divergence among implementations.". Unfortunately without moving to a 2.0 that clarifies this, I'm not sure there is backwards compatible way to tidy this up.

Just to check I'm understanding: does your many2many proposal about imply a set of associative tables?

I think my preference would be for us to be moving away from the current semi-relational model of HSDS, towards a nested tree structure (which is the direction HSDA takes us in anyway). That would add constraints to stop us having children without parents etc. expressed, and should make it easier to iterate over and parse data - but would leave the choice of how to normalise data into a relational database up to applications.

timgdavies avatar Jan 31 '18 19:01 timgdavies

@timgdavies: yes, by many2many I mean associative tables as per the link you provided. I'm interested to know more about what a nested tree structure is, I don't see a lot of detail on that in that link.

I wanted to add a comment here following our experience implementing the Address piece. As I suggested above, we went with this approach combined with a UNIQUE constraint to ensure that each location only has one address of each address type. However, since the set of address types is unlikely to change, this is functionally equivalent but more complex than this approach

Location
id
physical_address_id
postal_address_id
etc.
Address
id
attention
address_1
etc.

This uses just one address table, and the foreign key goes the other way, i.e. the location points to the address, not the other way as HSDS currently suggests. In the common case where the two addresses are identical, it would also only be stored once.

rasmus-storjohann-PG avatar Feb 08 '18 19:02 rasmus-storjohann-PG

Closing as I think we've addressed this in 3.0 via the move to JSON Schema; each phone, contact, and address have ids and are nested in arrays in other objects. The issue of database normalization I believe should be left to specific implementors, and as long as valid HSDS JSON is transmitted then it doesn't matter if specific implementations choose to have several 1:many or having many:many relationships between entities.

mrshll1001 avatar Nov 20 '23 15:11 mrshll1001