croissant icon indicating copy to clipboard operation
croissant copied to clipboard

Implement composite foreign keys

Open wardi opened this issue 5 months ago • 1 comments

It wasn't immediately clear how to represent composite primary keys in Croissant from the spec.

For a primary key it looks like we could define a type using a RecordSet dataType with all the columns that make up the key, then use that type as another RecordSet's key.

In a third RecordSet can we then use that same composite type with a references to the first?

In Data Package and CSVW composite foreign keys are simple because they are defined at the schema or table level instead of at the field level, so it's easy to list the corresponding columns, e.g:

"foreignKeys": [{
  "fields": ["my_field_1", "my_field_2"],
  "reference": {
    "resource": "parent_table",
    "fields": ["their_field_1", "their_field_2"]
  }
}]

wardi avatar Jul 30 '25 17:07 wardi

Per https://docs.mlcommons.org/croissant/docs/croissant-spec.html#field, composite primary keys are supported, by specifying multiple fields in the key or a RecordSet, e.g.:

{
  "@type": "cr:RecordSet",
  "@id": "ratings",
  "key": [{ "@id": "ratings/user_id" }, { "@id": "ratings/movie_id" }],
  "field": [
    {
      "@type": "cr:Field",
      "@id": "ratings/user_id",
      "dataType": "sc:Integer",
      "source": {
        "fileObject": { "@id": "ratings-table" },
        "extract": {
          "column": "userId"
        }
      }
    },
    {
      "@type": "cr:Field",
      "@id": "ratings/movie_id",
      "dataType": "sc:Integer",
      "source": {
        "fileObject": { "@id": "ratings-table" },
        "extract": {
          "column": "movieId"
        }
      },
      "references": {
        "@idfield": "movies/movie_id"
      }
    },

Composite foreign keys are not supported though, because "references" is defined at the field level. We would need to introduce a dedicated construct, as in DataPackages or CSVW to support that. I'd be in favor of doing that, and deprecating the current "references" mechanism to avoid redundancy.

benjelloun avatar Jul 31 '25 14:07 benjelloun