COW icon indicating copy to clipboard operation
COW copied to clipboard

If statement will not run for valueUrl

Open sytzevh opened this issue 2 years ago • 5 comments

The exercise for the if statement wiki page doesn't give the proper results. I included the prefix: "sdmx-code": "http://purl.org/linked-data/sdmx/2009/code#" in the cow_person_example.csv JSON schema, and replaced the "male" column with the following code:

 {
    "name": "male",
    "datatype": "string",
    "@id": "https://iisg.amsterdam/cow_person_example.csv/column/male",
    "dc:description": "The state of being male or female",
    "titles": ["male"],
    "propertyUrl": "sdmx-code:sex",
    "valueUrl": "sdmx-code:{% if male == '0' %}sex-F{% else %}sex-M{% endif %}"
  },

The valueUrl does not seem to accept the if statement, while a "csvw:value" with the same code does run without issues.

sytzevh avatar Jan 20 '23 10:01 sytzevh

This might be related to #148.

I did a brief look. The part of the code that is responsible for processing the valueUrl begins and ends on line 587 and 625, respectively. The expand_url function is then called upon the value, which in turn calls the render_pattern function to convert the value using the Jinja2 backend. Something goes wrong here, but I'm not yet sure what. I'll look into it some more later.

wxwilcke avatar Jan 27 '23 14:01 wxwilcke

Tried to replicate and I get the same issue: runs fine with csvw:value, but no triples are generated when using valueUrl. I don't recall issues with this previously.

rijpma avatar Jan 27 '23 15:01 rijpma

Hi @wxwilcke , though it could be that the output was always missing and we never noticed...

rijpma avatar Jan 27 '23 15:01 rijpma

Could this potentially be solved relatively easily by moving a line like https://github.com/CLARIAH/COW/blob/base/src/converter/csvw.py#L629 ? Or is there a lot more complexity to that?

rijpma avatar Jan 27 '23 16:01 rijpma

After a lot of testing I discovered that the recent versions of the rdflib json-ld parser won't process URIs with white space in it. This would normally be a good thing, but for some reason COW reads the metadata.json file as a json-ld file. Hence, the jinja pattern in the valueUrl tag gets ignored and is replaced by the base URI:

>>> import rdflib
>>> metadata_graph = rdflib.Graph()
>>> metadata_graph.load('../test/cow_person_example.csv-metadata.json', format='json-ld')
<Graph identifier=N9cadb1c623b84947975324b58c3ce06b (<class 'rdflib.graph.Graph'>)>
>>> for t in metadata_graph.triples((rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'),None,None)):
...     print(t)
... 
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#name'), rdflib.term.Literal('male'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#datatype'), rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://purl.org/dc/terms/description'), rdflib.term.Literal('The state of being male or female', lang='en'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#title'), rdflib.term.Literal('male', lang='en'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#propertyUrl'), rdflib.term.URIRef('http://purl.org/linked-data/sdmx/2009/code#sex'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#valueUrl'), rdflib.term.URIRef('https://example.com/id/'))

Ideally, COW gets rewritten to read the file as plain json, but this would require quite a bit of work. Instead, I fixed the issue by adding some code that replaces underscores ('_') by white spaces. However, the metadata file now has to be updated by replacing all white space in the valueUrl value by underscores:

{
    "name": "male",
    "datatype": "string",
    "@id": "https://iisg.amsterdam/cow_person_example.csv/column/male",
    "dc:description": "The state of being male or female",
    "titles": ["male"],
    "propertyUrl": "sdmx-code:sex",
    "valueUrl": "sdmx-code:{%_if_male_==_'0'_%}sex-F{%_else_%}sex-M{%_endif_%}"
   },

This allows the jinja patterns to be read by the json-ld parser:

>>> metadata_graph = rdflib.Graph()
>>> metadata_graph.load('../test/cow_person_example.csv-metadata.json', format='json-ld')
<Graph identifier=Nac44aa43f124410492df49bdc00fa9ad (<class 'rdflib.graph.Graph'>)>
>>> for t in metadata_graph.triples((rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'),None,None)):
...     print(t)
... 
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#name'), rdflib.term.Literal('male'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#datatype'), rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://purl.org/dc/terms/description'), rdflib.term.Literal('The state of being male or female', lang='en'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#title'), rdflib.term.Literal('male', lang='en'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#propertyUrl'), rdflib.term.URIRef('http://purl.org/linked-data/sdmx/2009/code#sex'))
(rdflib.term.URIRef('https://iisg.amsterdam/cow_person_example.csv/column/male'), rdflib.term.URIRef('http://www.w3.org/ns/csvw#valueUrl'), rdflib.term.URIRef("http://purl.org/linked-data/sdmx/2009/code#{%_if_male_==_'0'_%}sex-F{%_else_%}sex-M{%_endif_%}"))

I've uploaded the fix as branch issue148. @rijpma @sytzevh could you try the fix please? Could you also test whether this fix doesn't destroy other jinja patterns? Instead of installing it using pip, clone the branch and call the csvw_tool.py directly:

git clone https://github.com/CLARIAH/COW.git
cd COW
git checkout issue148
python ./src/csvw_tool.py build cow_person_example.csv
python ./src/csvw_tool.py convert cow_person_example.csv

wxwilcke avatar Jan 30 '23 13:01 wxwilcke