If statement will not run for valueUrl
The exercise for the if statement wiki page doesn't give the proper results. I included the prefix: "sdmx-code": "http://purl.org/linked-data/sdmx/2009/code#" in the cow_person_example.csv JSON schema, and replaced the "male" column with the following code:
{
"name": "male",
"datatype": "string",
"@id": "https://iisg.amsterdam/cow_person_example.csv/column/male",
"dc:description": "The state of being male or female",
"titles": ["male"],
"propertyUrl": "sdmx-code:sex",
"valueUrl": "sdmx-code:{% if male == '0' %}sex-F{% else %}sex-M{% endif %}"
},
The valueUrl does not seem to accept the if statement, while a "csvw:value" with the same code does run without issues.
This might be related to #148.
I did a brief look. The part of the code that is responsible for processing the valueUrl begins and ends on line 587 and 625, respectively. The expand_url function is then called upon the value, which in turn calls the render_pattern function to convert the value using the Jinja2 backend. Something goes wrong here, but I'm not yet sure what. I'll look into it some more later.
Tried to replicate and I get the same issue: runs fine with csvw:value, but no triples are generated when using valueUrl. I don't recall issues with this previously.
Hi @wxwilcke , though it could be that the output was always missing and we never noticed...
Could this potentially be solved relatively easily by moving a line like https://github.com/CLARIAH/COW/blob/base/src/converter/csvw.py#L629 ? Or is there a lot more complexity to that?
After a lot of testing I discovered that the recent versions of the rdflib json-ld parser won't process URIs with white space in it. This would normally be a good thing, but for some reason COW reads the metadata.json file as a json-ld file. Hence, the jinja pattern in the valueUrl tag gets ignored and is replaced by the base URI:
>>> import rdflib
>>> metadata_graph = rdflib.Graph()
>>> metadata_graph.load('../test/cow_person_example.csv-metadata.json', format='json-ld')
<Graph identifier=N9cadb1c623b84947975324b58c3ce06b (<class 'rdflib.graph.Graph'>)>
>>> for t in metadata_graph.triples((rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'),None,None)):
... print(t)
...
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#name'), rdflib.term.Literal('male'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#datatype'), rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://purl.org/dc/terms/description'), rdflib.term.Literal('The state of being male or female', lang='en'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#title'), rdflib.term.Literal('male', lang='en'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#propertyUrl'), rdflib.term.URIRef('http://purl.org/linked-data/sdmx/2009/code#sex'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#valueUrl'), rdflib.term.URIRef('https://example.com/id/'))
Ideally, COW gets rewritten to read the file as plain json, but this would require quite a bit of work. Instead, I fixed the issue by adding some code that replaces underscores ('_') by white spaces. However, the metadata file now has to be updated by replacing all white space in the valueUrl value by underscores:
{
"name": "male",
"datatype": "string",
"@id": "https://iisg.amsterdam/cow_person_example.csv/column/male",
"dc:description": "The state of being male or female",
"titles": ["male"],
"propertyUrl": "sdmx-code:sex",
"valueUrl": "sdmx-code:{%_if_male_==_'0'_%}sex-F{%_else_%}sex-M{%_endif_%}"
},
This allows the jinja patterns to be read by the json-ld parser:
>>> metadata_graph = rdflib.Graph()
>>> metadata_graph.load('../test/cow_person_example.csv-metadata.json', format='json-ld')
<Graph identifier=Nac44aa43f124410492df49bdc00fa9ad (<class 'rdflib.graph.Graph'>)>
>>> for t in metadata_graph.triples((rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'),None,None)):
... print(t)
...
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#name'), rdflib.term.Literal('male'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#datatype'), rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://purl.org/dc/terms/description'), rdflib.term.Literal('The state of being male or female', lang='en'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#title'), rdflib.term.Literal('male', lang='en'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#propertyUrl'), rdflib.term.URIRef('http://purl.org/linked-data/sdmx/2009/code#sex'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#valueUrl'), rdflib.term.URIRef("http://purl.org/linked-data/sdmx/2009/code#{%_if_male_==_'0'_%}sex-F{%_else_%}sex-M{%_endif_%}"))
I've uploaded the fix as branch issue148. @rijpma @sytzevh could you try the fix please? Could you also test whether this fix doesn't destroy other jinja patterns? Instead of installing it using pip, clone the branch and call the csvw_tool.py directly:
git clone https://github.com/CLARIAH/COW.git
cd COW
git checkout issue148
python ./src/csvw_tool.py build cow_person_example.csv
python ./src/csvw_tool.py convert cow_person_example.csv