permitdata.org icon indicating copy to clipboard operation
permitdata.org copied to clipboard

Mutability and fields (eg created/updated)

Open bensheldon opened this issue 10 years ago • 7 comments

The scheme should define resource lifecycles and accompanying fields. It should be strict about what modifications trigger a new updated timestamp (hopefully all of them, but it should declare that behavior).

This is important for ETLIng and syncing data.

bensheldon avatar Aug 28 '15 13:08 bensheldon

@bensheldon Apologies for the late reply on this.

Can you expand on this with a quick example? Thanks!

mheadd avatar Sep 11 '15 15:09 mheadd

As a user syncing data between my own database and a BLDS data set When a permit is added to the BLDS data set (not necessarily when it is accepted by the city), it should have created_at and updated_at touched And when the data row is changed within the BLDS data set, it should have updated_at touched

From my experience with Open311, in which there often is both an underlying Ticket system, and an intermediary/vendor/integrator serving the Open311 data, there is confusion around whether the timestamp fields represent the canonical Ticket system, or the intermediary's data and led to situations like:

A public open311 record was modified, but this was not reflected in "updated_at" because the modification was the result of a change to a secondary dataset that was integrated by the intermediary, rather than the primary record changing. In this case, the intermediary interpreted "updated_at" to only reflect changes to the primary record, not any secondary records, even though it triggered a change to the data served by Open311.

An analogous situation here might be: a building permit was not changed, but a separate contractor form was amended, causing a change to a contractor1 field. IMO, this should trigger updated_at and this behavior should be part of the specification.

bensheldon avatar Sep 11 '15 16:09 bensheldon

Ah, that makes sense. Any thoughts on this @axtheset?

mheadd avatar Sep 11 '15 17:09 mheadd

This is very useful, but perhaps should be optional?

mmartin78 avatar Oct 26 '15 16:10 mmartin78

The benefit of this behavior is ensuring data integrity and improving the efficiency of data syncing between producers and consumers.

To speak again from my experience with the Open311 specification, I think that by solely defining a data schema (syntax), but not defining the behaviors (semantics) of those fields, it makes it very difficult to actually integrate systems that conform to the specification.

bensheldon avatar Oct 27 '15 17:10 bensheldon

I agree with defining the semantics, just think it should be optional because I bet most agencies don't capture this data at all today.

mmartin78 avatar Oct 30 '15 20:10 mmartin78

Maybe we should expand the discussion to canonical vs non-canonical fields. In asking for both timestamps and a semantics for timestamps, I don't have a preference for whether this represents canonical data (e.g. a datetime that's been stamped on the original form), or non-canonical data (the datetime that's stored in the intermediary database), other than to ask that the representation and semantics be defined as part of the spec.

bensheldon avatar Oct 30 '15 20:10 bensheldon