neosemantics icon indicating copy to clipboard operation
neosemantics copied to clipboard

ImportRDF of turtle file fails with terminationStatus "KO"

Open rsit opened this issue 6 years ago • 15 comments

Hi,

somehow the file serialization of the example in the turorial fails. If I load the example remotely as explained in the tutorial it works fine. If I save the file locally and importRDF it via file://... it fails with: terminationStatus: "KO", extraInfo: "C". I am running neo4j in a Docker image. Do you have any idea how to solve this issue?

Thanks and best regards! Robert

rsit avatar Oct 17 '19 09:10 rsit

For me, the following command will not work, either using http:// or file://. Note that this is an example taken from the Website documentation page (http://neo4j-labs.github.io/neosemantics/#common_params)

CALL semantics.importOntology("http://jbarrasa.github.io/neosemantics/docs/rdf/vw.owl","Turtle") I also have termination status: "KO"

I downloaded the jar from your website and placed it in the plugins directory. Is there any compatibility issue between the downloaded jar and the version of neo4j I'm running (3.5.6)?

Thank you

zeidk avatar Oct 17 '19 15:10 zeidk

Hi,

somehow the file serialization of the example in the turorial fails. If I load the example remotely as explained in the tutorial it works fine. If I save the file locally and importRDF it via file://... it fails with: terminationStatus: "KO", extraInfo: "C". I am running neo4j in a Docker image. Do you have any idea how to solve this issue?

Thanks and best regards! Robert

Hi Robert, I had a similar issue in the past. Can you open the downloaded file and look if it is really what it should look like? Sometimes the html tags are enclosed in the file.

zeidk avatar Oct 17 '19 15:10 zeidk

I looked into it, it looks fine. I am using neo4j-community-3.5.11-unix

rsit avatar Oct 17 '19 16:10 rsit

I managed to load the RDF file (stored online) into my database. I then did the same thing as you did, I downloaded the file on my computer, checked that it's identical to the one stored online, and tried to import it. This didn't work.

There seems to be a command missing in the conf file that allows you to locally import a file.

I hope the developer can give us an answer very soon.

zeidk avatar Oct 17 '19 16:10 zeidk

Hi @rsit and @zeidk, a couple of comments:

  1. Some of the urls in the documentation are incorrect. Apologies for that. When we moved the repo from jbarrasa to neo4j-labs we managed to break some links :(
    We'll try to get them fixed ASAP. The sample ontology vw.owl is actually here at the moment. Also, I'd recommend you use this user guide instead of the old one.

  2. Re. the incomplete error message you're getting when trying to load a local file... is there anything in the logs that can help to figure out what's going on? Is it possible that we're dealing with windows paths? they require a slightly different syntax. Take a look a this comment and let me know if it helps.

jbarrasa avatar Oct 23 '19 13:10 jbarrasa

Hi @jbarrasa,

thanks for your reply and your help. Unfortunately, the problem still persists In the neo4j logs folder I only find debug.log. However, the semantic.importRDF call does not add any entries in this file. The file serialization does find the file, but somehow neo seems to stumble upon parsing it. I also tried putting a copy the nsmntx.ttl example file from your website into a nginx docker container and do the importRDF("http...) serialization. Same result terminationStatus "KO". When I do the same thing with the nsmntx.ttl example from your website neo loads it flawlessly. I was thinking that it might be an encoding problem. The file is UTF-8 encoded and I tried both LF and CRLF. Any ideas?

rsit avatar Oct 23 '19 14:10 rsit

@rsit thanks for the quick reply. would it be possible for you to put the file on a public URL so I can try to import from my end and debug the issue?

jbarrasa avatar Oct 23 '19 15:10 jbarrasa

I managed to load the file today from nginx docker container. However, my problem of serializing it from filesystem still persists. I attached the file to this post.

nsmntx.txt

rsit avatar Oct 24 '19 08:10 rsit

All looking good from here -> image I can parse the file directly from your post so the problem is not with the file clearly...

Not sure what to try next? Can you try with other RDF files (maybe in other serialisation formats) on the same directory in your filesystem?

jbarrasa avatar Oct 24 '19 09:10 jbarrasa

Hi @jbarrasa. I have a similar issue(s):

CALL semantics.importRDF("https://.../data/nsmntx.ttl","Turtle") is not working but CALL semantics.importRDF("https://raw.githubusercontent.com/jbarrasa/neosemantics/3.5/docs/rdf/nsmntx.ttl","Turtle") is definitely working.

The nsmntx.ttl is the same exact file. The error is: "IRI included an unencoded space: '32' [line 2]". Here is the full response: [ { "keys": [ "terminationStatus", "triplesLoaded", "triplesParsed", "namespaces", "extraInfo", "configSummary" ], "length": 6, "_fields": [ "KO", { "low": 0, "high": 0 }, { "low": 0, "high": 0 }, { "http://purl.org/dc/elements/1.1/": "dc", "http://purl.org/dc/terms/": "dct", "http://www.w3.org/1999/02/22-rdf-syntax-ns#": "rdf", "http://www.w3.org/2002/07/owl#": "owl", "http://www.w3.org/2004/02/skos/core#": "skos", "http://schema.org/": "sch", "http://www.w3.org/ns/shacl#": "sh", "http://www.w3.org/2000/01/rdf-schema#": "rdfs", "http://neo4j.org/vocab/sw#": "ns0" }, "IRI included an unencoded space: '32' [line 2]", {} ], "_fieldLookup": { "terminationStatus": 0, "triplesLoaded": 1, "triplesParsed": 2, "namespaces": 3, "extraInfo": 4, "configSummary": 5 } } ]

I also tried other RDF/XMLs and get "unqualified attribute 'lang' not allowed [line 3, column 17]" on all different RDF files. Here are a couple of simple RDF files:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:si="https://www.w3schools.com/rdf/"> <rdf:Description rdf:about="https://www.w3schools.com"> si:titleW3Schools.com</si:title> si:authorJan Egil Refsnes</si:author> </rdf:Description> </rdf:RDF

and another one:

<rdf:RDF xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#" xmlns:eric="http://www.w3.org/People/EM/contact#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="http://www.w3.org/People/EM/contact#me"> contact:fullNameEric Miller</contact:fullName> </rdf:Description> <rdf:Description rdf:about="http://www.w3.org/People/EM/contact#me"> <contact:mailbox rdf:resource="mailto:e.miller123(at)example"/> </rdf:Description> <rdf:Description rdf:about="http://www.w3.org/People/EM/contact#me"> contact:personalTitleDr.</contact:personalTitle> </rdf:Description> <rdf:Description rdf:about="http://www.w3.org/People/EM/contact#me"> <rdf:type rdf:resource="http://www.w3.org/2000/10/swap/pim/contact#Person"/> </rdf:Description> </rdf:RDF>

CALL semantics.importRDF("https://..../data/simple-rdf-xml.rdf","RDF/XML")

sri-roivant avatar Nov 25 '19 20:11 sri-roivant

I managed to load the file today from nginx docker container. However, my problem of serializing it from filesystem still persists. I attached the file to this post.

nsmntx.txt

Is this all to say that we can't load a local file into a dockerized neo4j with neosemantics commands?

When I push my files to github for instance they load from there fine... but when I reference local files I get nothing....

This is how i'm running my instance..

docker run --publish=7474:7474 --publish=7687:7687 --volume=$HOME/neo4j/data:/data --env='NEO4JLABS_PLUGINS=["apoc", "n10s"]' --env=NEO4J_AUTH=none neo4j:latest

I suppose it's worth mentioning i've tried from localhost:7474 and using py2neo... :

import pdb
from py2neo import Graph
from py2neo.matching import *

g = Graph("bolt://localhost:7687")
try:
    g.run('CREATE CONSTRAINT n10s_unique_uri ON (r:Resource) ASSERT r.uri IS UNIQUE')
except Exception:
    print("Unique Constrain on r:Resource already exists")


g.run("call n10s.graphconfig.init({handleVocabUris: 'IGNORE'})")
stream_triples = 'CALL n10s.rdf.import.fetch("file://Users/liam/spectral_work_1/Brick/examples/custom_brick_v103_sample_graph.ttl","Turtle");'
data = g.run(stream_triples).data()
print(data)
[{'terminationStatus': 'KO', 'triplesLoaded': 0, 'triplesParsed': 0, 'namespaces': None, 'extraInfo': 'Users', 'callParams': {}}]
results = g.run(stream_triples)

Logs show nothing...

APOC couln't set a URLStreamHandlerFactory since some other tool already did this (e.g. tomcat). This means you cannot use s3:// or hdfs:// style URLs in APOC. This is caused by a limitation of the JVM which we cannot fix.
2020-10-01 13:00:42.730+0000 INFO  Starting...
2020-10-01 13:00:53.298+0000 INFO  ======== Neo4j 4.1.2 ========
2020-10-01 13:01:35.685+0000 INFO  Performing postInitialization step for component 'security-users' with version 2 and status CURRENT
2020-10-01 13:01:35.688+0000 INFO  Updating the initial password in component 'security-users'
2020-10-01 13:01:43.235+0000 INFO  Called db.clearQueryCaches(): Query cache already empty.
2020-10-01 13:01:43.426+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2020-10-01 13:01:46.794+0000 INFO  Remote interface available at http://localhost:7474/
2020-10-01 13:01:46.797+0000 INFO  Started.

debug.logs too look uninteresting:

2020-10-02 01:15:52.395+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=142, gcTime=180, gcCount=1}
2020-10-02 01:15:53.378+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] [neo4j] Checkpoint triggered by "Scheduled checkpoint for every 15 minutes threshold" @ txId: 200 checkpoint started...
2020-10-02 01:15:53.662+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] [neo4j] Checkpoint triggered by "Scheduled checkpoint for every 15 minutes threshold" @ txId: 200 checkpoint completed in 282ms
2020-10-02 01:15:53.671+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] [neo4j] No log version pruned. The strategy used was '104857600 size'. Last checkpoint was made in log version 0.
2020-10-02 01:16:33.365+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=122, gcTime=195, gcCount=1}
2020-10-02 01:16:49.118+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=138, gcTime=0, gcCount=0}
2020-10-02 01:16:49.301+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=257, gcTime=0, gcCount=0}
2020-10-02 01:16:55.921+0000 INFO [o.n.c.i.ExecutionEngine] [neo4j] Discarded stale query from the query cache after 22764 seconds. Reason: NodesAllCardinality changed from 139.0 to 10.0, which is a divergence of 0.9280575539568345 which is greater than threshold 0.10914526452768099. Query: MATCH (gc:_GraphConfig) RETURN gc
2020-10-02 01:17:16.098+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=148, gcTime=0, gcCount=0}
root@11a002070741:/var/lib/neo4j/logs#

iamliamc avatar Oct 02 '20 01:10 iamliamc

Hi @iamliamc, I think you are right... I am trying to import local RDF files into a neo4j instance running in a docker container, but it is impossible. I think neosemantics tries to find the file inside the docker container; however, the files that we would like to import are outside docker container... Have you found any solution for this, in addition to copying files into the container or putting them into a public url? Maybe this also fails when importing any file into a remote neo4j instance since the source files were not in the neo4j server.

fanavarro avatar Mar 15 '21 19:03 fanavarro

Hi @fanavarro I think the options are:

  1. Mount a local directory to the container at runtime, i.e. something like this, though I haven't tried it myself: docker run -v /local/some_folder:/container/some_folder

Then reference that with the n10s call...

  1. Host the files so it can be retrieved from the web

iamliamc avatar Mar 15 '21 22:03 iamliamc

Hi @iamliamc. Finally I've adopted a third option... In my application, I am using rdf4j for reading the RDF files by using a buffer of statements. Once the buffer is full, I serialize the statements into turtle format and I passed the resulting string to the n10s.rdf.import.inline function to upload the statements into the neo4j database. Nonetheless, I'm experiencing memory usage issues... I'll open a new issue for that.

fanavarro avatar Mar 17 '21 11:03 fanavarro

I am having a similar issue that manifests on Win10, local Neo4j 1.4.8 install, database 4.2.5 and n10s version 4.2.0.1. Fetching from local drive file://C:\Users\Otso\test.ttl fails with KO and extrainfo "C", but the same file uploaded to a remote server and referenced via a URL loads without any problems. On Debian, fetching the same file via file:///home/otso/test.ttl loads as well.

OtsoHelenius avatar Sep 13 '21 11:09 OtsoHelenius