pyrml icon indicating copy to clipboard operation
pyrml copied to clipboard

Input file path and type not abstracted from rml mapping

Open henrieglesorotos opened this issue 2 years ago • 8 comments

Currently the input file can't be parameterised via cli or api. It is hardcoded into the mapping file. Eg:

rml:logicalSource [ 
    rml:source "./examples/artists/Artist.csv" ;
    rml:referenceFormulation ql:CSV
  ]

It would be more flexible to be able to provide this as a parameter.

henrieglesorotos avatar Nov 09 '23 13:11 henrieglesorotos

Reckon it's something we could work on @anuzzolese? Also are there any tests?

henrieglesorotos avatar Nov 09 '23 13:11 henrieglesorotos

Hi @henrieglesorotos, if i got the problem you are referring to correctly I would say that it is somehow implemented (maybe not the best solution, but we can discuss about improvements). In fact, pyrml supports the parametrisation of RML mapping files by relying on Jinja2.

RML files processed by pyrml can accepts parameters as Jinja2 does, e.g.:

rml:logicalSource [ 
    rml:source {{ source_file }};
    rml:referenceFormulation ql:CSV
  ]

Than when you instantiate your mapper in the Python code you can do something like this:

from pyrml import RMLConverter
from rdflib import Graph

rml_map_file: str = '/path_to_your_rml'

# here you create a dictionary for linking actual values to the parameter defined in the RML files (i.e. 'source_file').
vars = {'source_file': './examples/artists/Artist.csv'}

rml_mapper: RMLConverter = RMLConverter.get_instance()
g: Graph = rml_mapper.convert(rml_map_file, template_vars=vars)

anuzzolese avatar Nov 09 '23 13:11 anuzzolese

This is excellent news! Can we add to the docs? Also - shall we create some simple tests if they don't exist?

henrieglesorotos avatar Nov 09 '23 13:11 henrieglesorotos

Yes, controbuting in documenting and providing how-to guides would be utmost helpful.

anuzzolese avatar Nov 09 '23 13:11 anuzzolese

@anuzzolese

Having some issues. See example below:

We have some pre-existing rml rules in mapping.ttl:

@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix fnml: <http://semweb.mmlab.be/ns/fnml#>.
@prefix fno: <https://w3id.org/function/ontology#>.
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#>.
@prefix void: <http://rdfs.org/ns/void#>.
@prefix dc: <http://purl.org/dc/terms/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.
@prefix : <http://mapping.example.com/>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix skos: <http://www.w3.org/2004/02/skos/core#>.
@prefix industries: <https://data.beamery.com/naics/2022/industries/>.

:rules_000 a void:Dataset.
:source_000 a rml:LogicalSource;
    rml:source "input.json";
    rml:iterator "$";
    rml:referenceFormulation ql:JSONPath.
:rules_000 void:exampleResource :map_Concept_000.
:map_Concept_000 rml:logicalSource :source_000;
    a rr:TriplesMap;
    rdfs:label "Concept".
:s_000 a rr:SubjectMap.
:map_Concept_000 rr:subjectMap :s_000.
:s_000 rr:template "https://data.beamery.com/naics/2022/industries/{NAICS22}#this";
    rr:graphMap :gm_000.
:gm_000 a rr:GraphMap;
    rr:template "https://data.beamery.com/naics/2022/industries/{NAICS22}".
:pom_000 a rr:PredicateObjectMap.
:map_Concept_000 rr:predicateObjectMap :pom_000.
:pm_000 a rr:PredicateMap.
:pom_000 rr:predicateMap :pm_000.
:pm_000 rr:constant skos:example.
:pom_000 rr:objectMap :om_000.
:om_000 a rr:ObjectMap;
    rml:reference "Index Item Description";
    rr:termType rr:Literal;
    rml:languageMap :language_000.
:language_000 rr:constant "en".

Input file: input.json

{"NAICS22":"315990","Index Item Description":"Hats, cloth, cut and sewn from purchased fabric (except apparel contractors)"}

I am getting:

python converter.py -o test.ttl mapping.ttl
Traceback (most recent call last):
  File "/Users/henrieglesorotos/repos/pyrml/converter.py", line 65, in <module>
    PyrmlCMDTool().do_map()
  File "/Users/henrieglesorotos/repos/pyrml/converter.py", line 34, in do_map
    g = rml_converter.convert(self.__args.input, self.__args.m)
  File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_mapper.py", line 131, in convert
    triple_mappings = RMLParser.parse(rml_mapping)
  File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_mapper.py", line 46, in parse
    return TripleMappings.from_rdf(g)
  File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 1586, in from_rdf
    return set([TripleMappings.__build(g, row) for row in qres])
  File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 1586, in <listcomp>
    return set([TripleMappings.__build(g, row) for row in qres])
  File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 1594, in __build
    predicate_object_maps = PredicateObjectMap.from_rdf(g, row.tm)
  File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 752, in from_rdf
    return list(map(lmbd(g), qres))
  File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 751, in <lambda>
    lmbd = lambda graph : lambda row :  PredicateObjectMap.__build(graph, row)
  File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 758, in __build
    predicates = PredicateBuilder.build(g, row.pom)
  File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 669, in build
    predicates += PredicateMap.from_rdf(g, predicate_ref)
  File "/Users/henrieglesorotos/repos/pyrml/pyrml/pyrml_core.py", line 629, in from_rdf
    pm = PredicateMap(row.tripleMap, row.map, row.termType, row.predicateMap)
  File "/Users/henrieglesorotos/repos/pyrml/venv/lib/python3.9/site-packages/rdflib/query.py", line 124, in __getattr__
    raise AttributeError(name)
AttributeError: tripleMap

Any ideas?

henrieglesorotos avatar Nov 09 '23 16:11 henrieglesorotos

Btw - we generally work in yarrrml so it's simpler, and then convert using https://github.com/RMLio/yarrrml-parser

henrieglesorotos avatar Nov 09 '23 16:11 henrieglesorotos

FYI:

python --version == 3.9.0

pip freeze

click==8.1.7
decorator==5.1.1
Flask==2.2.2
importlib-metadata==6.8.0
isodate==0.6.1
itsdangerous==2.1.2
Jinja2==3.1.2
jsonpath-ng==1.5.3
lark-parser==0.12.0
MarkupSafe==2.1.3
numpy==1.23.4
pandas==1.5.1
ply==3.11
pyparsing==3.1.1
pyrml==0.3.0
python-dateutil==2.8.2
python-slugify==7.0.0
pytz==2023.3.post1
rdflib==6.2.0
shortuuid==1.0.9
six==1.16.0
SPARQLWrapper==2.0.0
text-unidecode==1.3
Unidecode==1.3.7
werkzeug==3.0.1
zipp==3.17.0

henrieglesorotos avatar Nov 09 '23 16:11 henrieglesorotos

Did you manage to replicate this @anuzzolese ?

henrieglesorotos avatar Nov 15 '23 10:11 henrieglesorotos