croissant icon indicating copy to clipboard operation
croissant copied to clipboard

Support restrictions on data values

Open wardi opened this issue 5 months ago • 2 comments

XML Schema, Data Package and JSON Schema all support restricting or permitting values that may appear in the data. Some of the common features are:

restriction XML Schema
allow nulls <xs:element … nillable="true"/>
string length <xs:minLength value="5"/>
<xs:maxLength value="20"/>
regular expression <xs:pattern value="[a-z]{3}-\d{4}"/>
range limits <xs:minInclusive value="2025-09-01"/>
<xs:maxExclusive value="2025-10-01"/>
enumeration <xs:enumeration value="lepton"/>
<xs:enumeration value="boson"/>
<xs:enumeration value="quark"/>

Having data restrictions as part of the Croissant could make it easier to determine if data from different sources is comparable and help with "pushing left" on data quality so errors can be caught before data enters a system.

If there's support for this I can work on a strawman for all or some or all of these features.

wardi avatar Jul 30 '25 18:07 wardi

I think this is a good idea!

benjelloun avatar Aug 04 '25 09:08 benjelloun

+1 on this!

Note that JSON Schema also has the format keyword that indicates a specific semantic meaning or expected format, such as "date-time", "duration", "email", "hostnames", "resource identifiers", etc.

https://json-schema.org/draft/2020-12/json-schema-validation#name-defined-formats

Further, high-performance JSON Schema validators like https://docs.rs/jsonschema/latest/jsonschema/ support custom formats which the spec allows.

jqnatividad avatar Aug 13 '25 11:08 jqnatividad