linked-csv icon indicating copy to clipboard operation
linked-csv copied to clipboard

Support Deep-Hierarchic Structures in a Single Document

Open craig552uk opened this issue 11 years ago • 2 comments

The Linked CSV standard allows for multiple rows to be joined on the $id field. However this can only represent list structures. Extending this and introducing a new prolog line join would allow recursive structures of any depth to be represented in tabular format.

The examples below illustrate how this might work with University course data represented as a 3-layered hierarchical data structure: Department > Course > Requirement

courses.csv

 #    ,$id0 ,department       ,$id1     ,course_name         ,location          ,level         ,grades
 join ,     ,                 ,#courses ,#courses            ,#courses          ,#courses      ,#courses
 join ,     ,                 ,         ,                    ,#location         ,#requirements ,#requirements 
      ,#AMS ,American Studies ,         ,                    ,                  ,              ,
      ,#AMS ,                 ,#T700    ,American Studies BA ,                  ,              ,
      ,#AMS ,                 ,#T700    ,                    ,On Campus         ,A             ,BBB
      ,#AMS ,                 ,#T700    ,                    ,Distance Learning ,IB            ,Pass diploma with 30 points
      ,#AMS ,                 ,#T700    ,                    ,                  ,Access        ,Pass diploma with 30 level 3 credits
      ,#AMS ,                 ,#T700    ,                    ,                  ,BTEC          ,Pass diploma with DDM
      ,#AMS ,                 ,#T701    ,American Studies MA ,                  ,              ,
      ,#AMS ,                 ,#T701    ,                    ,On Campus         ,A             ,ABB
      ,#AMS ,                 ,#T701    ,                    ,Distance Learning ,IB            ,Pass diploma with 32 points
      ,#AMS ,                 ,#T701    ,                    ,                  ,Access        ,Pass diploma with 30 level 3 credits
      ,#AMS ,                 ,#T701    ,                    ,                  ,BTEC          ,Pass diploma with DDM

courses.json

[{
  "@id": "#AMS",
  "department": "American Studies",
  "courses":[{
    "@id": "#T700",
    "course_name": "American Studies BA",
    "location": ["On Campus", "Distance Learning"],
    "requirements": [
      {"level": "A",      "grades": "BBB"},
      {"level": "IB",     "grades": "Pass diploma with 30 points"},
      {"level": "Access", "grades": "Pass diploma with 30 level 3 credits"},
      {"level": "BTEC",   "grades": "Pass diploma with DDM"}
    ]
  },{
    "@id": "#T701",
    "course_name": "American Studies MA",
    "location": ["On Campus", "Distance Learning"],
    "requirements": [
      {"level": "A",      "grades": "ABB"},
      {"level": "IB",     "grades": "Pass diploma with 32 points"},
      {"level": "Access", "grades": "Pass diploma with 30 level 3 credits"},
      {"level": "BTEC",   "grades": "Pass diploma with DDM"}
    ]
  }]
}]

The $id0 field is used to join table rows as a single record in much the same way as $id is used in Linked CSV. Subsequent $id* fields are used to join rows at lower levels of the data structure.

$id* fields must use incremental integers specifying the level of the structure that they apply to. The exception being $id which is an alias for $id0.

The scope (across fields) of joins at each level are specified by use of the join prolog lines. join statements must be listed in increasing order of specificity of the structure. In the attached example all fields set to join under the #courses identifier are joined in to an object on the second tier of the structure.

A $id* field must have an associated join statement if it is greater than $id0. No join statement is needed to specify the scope of the top level ($id0) as it is assumed to be composed of all fields.

If a join statement is provided across multiple fields (e.g #requirements) those fields are joined in to an object. In this case if no $id* field is provided, each row of the table is considered to be a separate object.

If a join statement is provided across a single field (e.g. #location) that field is joined in to a list. No $id* field can be used in this case.

Multiple join scopes can be specified in a single statement, but all scopes in a statement must be contained within the scope of the preceding join statement (except the first).

I don't know if you think this enhancement is necessary in the Linked CSV spec, as deep hierarchic structures can be created by linking together multiple documents. But I think it would be nice to be able to accommodate this in a single file. What do you think?

craig552uk avatar Apr 26 '13 09:04 craig552uk