linked-csv
linked-csv copied to clipboard
Support Deep-Hierarchic Structures in a Single Document
The Linked CSV standard allows for multiple rows to be joined on the $id field. However this can only represent list structures. Extending this and introducing a new prolog line join
would allow recursive structures of any depth to be represented in tabular format.
The examples below illustrate how this might work with University course data represented as a 3-layered hierarchical data structure: Department > Course > Requirement
courses.csv
# ,$id0 ,department ,$id1 ,course_name ,location ,level ,grades
join , , ,#courses ,#courses ,#courses ,#courses ,#courses
join , , , , ,#location ,#requirements ,#requirements
,#AMS ,American Studies , , , , ,
,#AMS , ,#T700 ,American Studies BA , , ,
,#AMS , ,#T700 , ,On Campus ,A ,BBB
,#AMS , ,#T700 , ,Distance Learning ,IB ,Pass diploma with 30 points
,#AMS , ,#T700 , , ,Access ,Pass diploma with 30 level 3 credits
,#AMS , ,#T700 , , ,BTEC ,Pass diploma with DDM
,#AMS , ,#T701 ,American Studies MA , , ,
,#AMS , ,#T701 , ,On Campus ,A ,ABB
,#AMS , ,#T701 , ,Distance Learning ,IB ,Pass diploma with 32 points
,#AMS , ,#T701 , , ,Access ,Pass diploma with 30 level 3 credits
,#AMS , ,#T701 , , ,BTEC ,Pass diploma with DDM
courses.json
[{
"@id": "#AMS",
"department": "American Studies",
"courses":[{
"@id": "#T700",
"course_name": "American Studies BA",
"location": ["On Campus", "Distance Learning"],
"requirements": [
{"level": "A", "grades": "BBB"},
{"level": "IB", "grades": "Pass diploma with 30 points"},
{"level": "Access", "grades": "Pass diploma with 30 level 3 credits"},
{"level": "BTEC", "grades": "Pass diploma with DDM"}
]
},{
"@id": "#T701",
"course_name": "American Studies MA",
"location": ["On Campus", "Distance Learning"],
"requirements": [
{"level": "A", "grades": "ABB"},
{"level": "IB", "grades": "Pass diploma with 32 points"},
{"level": "Access", "grades": "Pass diploma with 30 level 3 credits"},
{"level": "BTEC", "grades": "Pass diploma with DDM"}
]
}]
}]
The $id0
field is used to join table rows as a single record in much the same way as $id
is used in Linked CSV. Subsequent $id*
fields are used to join rows at lower levels of the data structure.
$id*
fields must use incremental integers specifying the level of the structure that they apply to. The exception being $id
which is an alias for $id0
.
The scope (across fields) of joins at each level are specified by use of the join
prolog lines. join
statements must be listed in increasing order of specificity of the structure.
In the attached example all fields set to join under the #courses
identifier are joined in to an object on the second tier of the structure.
A $id*
field must have an associated join
statement if it is greater than $id0
.
No join statement is needed to specify the scope of the top level ($id0
) as it is assumed to be composed of all fields.
If a join
statement is provided across multiple fields (e.g #requirements
) those fields are joined in to an object. In this case if no $id*
field is provided, each row of the table is considered to be a separate object.
If a join
statement is provided across a single field (e.g. #location
) that field is joined in to a list. No $id*
field can be used in this case.
Multiple join
scopes can be specified in a single statement, but all scopes in a statement must be contained within the scope of the preceding join
statement (except the first).
I don't know if you think this enhancement is necessary in the Linked CSV spec, as deep hierarchic structures can be created by linking together multiple documents. But I think it would be nice to be able to accommodate this in a single file. What do you think?