Support restrictions on data values
XML Schema, Data Package and JSON Schema all support restricting or permitting values that may appear in the data. Some of the common features are:
| restriction | XML Schema |
|---|---|
| allow nulls | <xs:element … nillable="true"/> |
| string length | <xs:minLength value="5"/> <xs:maxLength value="20"/> |
| regular expression | <xs:pattern value="[a-z]{3}-\d{4}"/> |
| range limits | <xs:minInclusive value="2025-09-01"/> <xs:maxExclusive value="2025-10-01"/> |
| enumeration | <xs:enumeration value="lepton"/> <xs:enumeration value="boson"/> <xs:enumeration value="quark"/> |
Having data restrictions as part of the Croissant could make it easier to determine if data from different sources is comparable and help with "pushing left" on data quality so errors can be caught before data enters a system.
If there's support for this I can work on a strawman for all or some or all of these features.
I think this is a good idea!
+1 on this!
Note that JSON Schema also has the format keyword that indicates a specific semantic meaning or expected format, such as "date-time", "duration", "email", "hostnames", "resource identifiers", etc.
https://json-schema.org/draft/2020-12/json-schema-validation#name-defined-formats
Further, high-performance JSON Schema validators like https://docs.rs/jsonschema/latest/jsonschema/ support custom formats which the spec allows.