[RCP-36] Data Dictionary 2.1
There are a number of things to add to the current DD 2.0 specification for 2.1.
DATA DICTIONARY 2.1 DRAFT REFERENCE SHEET
Summary of Changes
1. Cross-Validation of Fields
- If both Country and StateOrProvince are present, validate against ISO 3166-2. Warn if the combination is not correct but don't fail yet.
2. Support for the RESO Universal Parcel Identifier (UPI)
- Three new fields have been added to support the RESO UPI:
UniversalParcelId,CountrySubdivision, andParcelSubcomponent. - Read more about the UPI.
- If UPI-based fields are present in the Property Resource payload, they will be validated.
- UPI testing tools are available in the RESO SDK.
6. Previously-Excluded Schema Validation Items
- Some items were previously excluded from schema validation with the goal of enforcing them in the next version (2.1). Is this still the plan?
- See RESO SDK.
- Currently, these generate warnings.
- Most are related to schools. There are potential enumerations for them. This will work in the U.S., so the enumeration could be closed in that case, with other countries TBD.
- ImageSizeDescription is currently an enumeration in the Data Dictionary. Should this be a string instead since it's a description?
Is there any room in Data Dictionary for fields that
- the dictionary defines as String List, Single (or String List, Multi)
- but the dictionary recognizes that the list of values is local, and should not be dictated by the dictionary ?
To me, it feels like schools, school districts, and MLS Areas (and even cities 😬) are all local concerns, so to me it seems like a feature of the dictionary, not a bug of the dictionary, that these fields are defined as String Lists but the values aren't provided by the dictionary.
It does seem fair to validate that the metadata and the sampled data do match. But I think it's completely reasonable to not have the dictioary enumerate all possible values itself.
Hi @bryanburgers! 👋
Thanks for the excellent question.
There are three kinds of enumerations currently in use by the Data Dictionary:
Open with Enumerations: A lookup (enumeration) list exists, but the list is still open and other values may be transmitted. Added items must be reasonably relevant to the definition of the given field.
Locked with Enumerations: A lookup (enumeration) exists and is finite. No other values may be used.
Open: No lookup (enumeration) list exists, and any relevant value may be transmitted. Sent Lookups must be reasonably relevant to the definition of the given field.
All of the fields that were ignored in DD 2.0 are Open, meaning that RESO doesn't offer any guidance and expects to the provider to enumerate them:
Property
- MLSAreaMinor
- MLSAreaMajor
- MiddleOrJuniorSchoolDistrict
- ElementarySchool
- HighSchoolDistrict
- HighSchool
- ElementarySchoolDistrict
- MiddleOrJuniorSchool
Media
- ImageSizeDescription
ImageSizeDescription is an interesting one, since some providers are actually using it for IDs in practice. In other words, misusing the field according to the definition above which states, "sent Lookups must be reasonably relevant to the definition of the given field."
However, there is some question around whether a size "description" field should be enumerated and if so, how? The definition of that field says, "A text description of the size of the image (i.e., Small, Thumbnail, Medium, Large, X-Large). The largest image must be described as "Largest," and the thumbnail must also be included. A pick list will remain open/extendable."
There are some interesting requirements defined with "must" here that people aren't following and RESO isn't enforcing. One option might be to consider changing this field to a string in the next major release unless some meaningful values could be defined? But "Large" or "Medium" for an image is fairly nebulous.
Regarding schools, there are potential sources we could use if the community felt enumerating them would be useful.
It does seem fair to validate that the metadata and the sampled data do match. But I think it's completely reasonable to not have the dictionary enumerate all possible values itself.
This is the intent behind DD 2.0 testing - that enumerated values would still be advertised by the provider in the metadata even if the Data Dictionary doesn't define them. So, I agree with your statement.
On item 6, I know the words "SELECT DISTINCT" have been thrown around a bit as a reference for the idea that a provider might not have an actual enumeration for these fields so they will look at all of the stored data that they have for that field and return it (which in SQL parlance is done using a "SELECT DISTINCT" query).
If we require this for certification, some providers might want to do that only so that they can achieve certification.
The Lookup resource has a ModificationTimestamp field that is intended to help facilitate replication of lookup values, which somewhat (waves hands, there may be clever ways around this) indicates that providers know what their lookup values are and know when they were added.
For some of these fields that are huge lists of previously potentially unenumerated values, trying to force providers to enumerate them mostly leaves providers trying to decide on whether to use a really old ModificationTimestamp (consequence: consumers don't get new values) or use a really new ModificationTimestamp (consequence: consumers redownload thousands of rows they already have every time they hit the endpoint).
That may be a point against using certification as a forcing function to get providers to list all of their school data?
(And no I wasn't holding out on ya'll during the Transport or Certification meetings, this occurred to me after those meetings.)
ModificationTimestamp is there to help clients synchronize.
So the first time they connect to the API they'd pull anything that was there and the timestamps could be anything. Then, if the provider adds or updates those values they'd use the time the Lookup Resource record was added or updated, not some historical date.
If the provider also allowed new enumerations to be added then those records would be created in the Lookup Resource accordingly.
On Thu, Jan 16, 2025, 14:22 Bryan Burgers @.***> wrote:
On item 6, I know the words "SELECT DISTINCT" have been thrown around a bit as a reference for the idea that a provider might just look at all of the data that they have for that field and return it.
So some providers might want to just do that so they can be certified.
The Lookup resource https://ddwiki.reso.org/display/DDW20/Lookup+Resource has a ModificationTimestamp field that is intended to help facilitate replication of lookup values, which somewhat (waves hands, there may be clever ways around this) indicates that providers know what their lookup values are and know when they were added.
For some of these fields that are huge lists of previously potentially unenumerated values, trying to force providers to enumerate them mostly leaves providers trying to decide on whether to use a really old ModificationTimestamp (consequence: consumers don't get new values) or use a really new ModificationTimestamp (consequence: consumers redownload thousands of rows they already have every time they hit the endpoint).
That may be a point against using certification as a forcing function to get providers to list all of their school data?
(And no I wasn't holding out on ya'll during the Transport or Certification meetings, this occurred to me after those meetings.)
— Reply to this email directly, view it on GitHub https://github.com/RESOStandards/transport/issues/158#issuecomment-2597028202, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAECWPSRK2TON7NBIWOJBAT2LAWJZAVCNFSM6AAAAABU354S2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJXGAZDQMRQGI . You are receiving this because you were assigned.Message ID: @.***>