eventkg icon indicating copy to clipboard operation
eventkg copied to clipboard

Dataset .nq files are not valid n-quads

Open vilunov opened this issue 5 years ago • 4 comments

Hello! According to this specification of n-quads, character @ can be encountered only in literals, i.e. directives (lines starting with @) are not allowed in .nq files. This makes it impossible to import EventKG in triplestores with strict parsers, such as Apache Jena and neo4j (with plugin). The only triplestore I managed to get this imported to is OpenLink Virtuoso (the same you use to serve the endpoint I suppose), but it lacks the features I need and generally venderlocking to one implementation is not good. Is there any way to solve this issue? I haven't run the pipeline manually, but if you can tell me whether this is solvable and how I can fix this, I'd be happy to submit a pull request.

vilunov avatar Jul 01 '19 17:07 vilunov

Also I believe that whenever typed literals are defined, xsd prefix must be defined too. It is defined in schema.ttl, but not in void.ttl, for example.

vilunov avatar Jul 01 '19 20:07 vilunov

Hi! Thanks a lot for you advice. I did my own test and loaded the files with Apache Jena and can confirm both of your problems. You were right with your assumption that I am using Virtuoso as the triplestore which seems to be much more tolerant.

I will update the code to create valid NQ files. Are you planning to run the code yourself or are you just interested in the valid data?

sgottsch avatar Jul 03 '19 10:07 sgottsch

Hi, thanks a lot for the reply. I am currently interested in the valid data only, but I could run the code in my future work.

vilunov avatar Jul 03 '19 12:07 vilunov

Alright. I am currently preparing a new version of EventKG (2.1), with current data and some minor corrections and extensions. I will also provide the valid .nq files then. However, this will take some time (one week maybe?) I'll let you know when it's done.).

If you need a fast solution, you could instead just transform the existing .nq files with simple string operations (replacement of the prefix namespaces with the actual URLs).

sgottsch avatar Jul 03 '19 12:07 sgottsch