amazon-neptune-tools
amazon-neptune-tools copied to clipboard
Exporting RDF Graph fails with QueryEvaluationException after 4-5 minutes
I run the following command to export an RDF Graph:
java -Xms16g -Xmx16g -jar neptune-export.jar export-rdf -e orpheus-6-instance-1.cfm103hnhdrl.us-east-2.neptune.amazonaws.com -p 8182 --output files -d /home/ec2-user/neptune-export --region us-east-2 --format neptuneStreamsJson --use-ssl --use-iam-auth
After running for around 4 minutes or so the export terminates (with a partial export) with the following stack trace:
java.lang.RuntimeException: org.eclipse.rdf4j.query.QueryEvaluationException: Unkown record type: 123 at com.amazonaws.services.neptune.rdf.NeptuneSparqlClient.executeQuery(NeptuneSparqlClient.java:167) at com.amazonaws.services.neptune.rdf.io.ExportRdfGraphJob.lambda$execute$0(ExportRdfGraphJob.java:34) at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:41) at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:34) at com.amazonaws.services.neptune.rdf.io.ExportRdfGraphJob.execute(ExportRdfGraphJob.java:31) at com.amazonaws.services.neptune.ExportRdfGraph.lambda$run$0(ExportRdfGraph.java:63) at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:41) at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:34) at com.amazonaws.services.neptune.ExportRdfGraph.run(ExportRdfGraph.java:55) at com.amazonaws.services.neptune.export.NeptuneExportRunner.run(NeptuneExportRunner.java:44) at com.amazonaws.services.neptune.NeptuneExportCli.main(NeptuneExportCli.java:48) Caused by: org.eclipse.rdf4j.query.QueryEvaluationException: Unkown record type: 123 at org.eclipse.rdf4j.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:59) at com.amazonaws.services.neptune.rdf.NeptuneSparqlClient.executeQuery(NeptuneSparqlClient.java:127) ... 10 more Caused by: java.io.IOException: Unkown record type: 123 at org.eclipse.rdf4j.query.resultio.binary.BinaryQueryResultParser.parse(BinaryQueryResultParser.java:188) at org.eclipse.rdf4j.query.resultio.AbstractTupleQueryResultParser.parseQueryResult(AbstractTupleQueryResultParser.java:48) at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getTupleQueryResult(SPARQLProtocolSession.java:699) at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendTupleQuery(SPARQLProtocolSession.java:369) at org.eclipse.rdf4j.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:56) ... 11 more An error occurred while exporting from Neptune: org.eclipse.rdf4j.query.QueryEvaluationException: Unkown record type: 123
Hi @cloudronin – Thanks for reporting this. Can you identify the specific records that the tool is trying to export at the point where it fails, and provide some details about the content? From the exception, it looks like the tool is trying to export a piece of binary content that rdf4j can't recognize.
Closing this issue because of lack of reproducer
fwiw, I'm able to reproduce this error on a private neptune instance. What debug information should I include to help the team understand the issue in question?
In particular, if we can't expect to fix this, it would be nice to know how to work around it with a well-crafted SPARQL query :)
@azoff - If you have a sample dataset that can reproduce the error above, that would be useful for us to use in troubleshooting and providing a fix.
@triggan I'd love to help but the irony is that we can't export the errant data - any tips on how to export the data in question, console or CLI? Keep in mind we're a bit new to neptune and graph dbs
@azoff - I think I discovered the root cause. If one of the issued SPARQL queries made by the neptune-export utility returns an error, the RDF4j library we use to handle these queries is unable to parse the query/request error message. This returns an error response with Unkown record type: 123
(even with the typo - which exists in the RDF4j code).
I was able to trace this via the expected RDF values being read/parsed by RDF4j. Below is the output of the traced value:
Encoded String: [B@695c938d
Converted String: uestId":"edc05258-f4b2-4b01-b944-d96abf61a741","code":"TimeLimitExceededException","detailedMessage":"Operation terminated (deadline exceeded)"}
In my case I wasn't creating a cloned cluster, but using the original source cluster for the export (with the default 2 min query timeout). When using a cloned cluster (using the --clone-cluster
parameter), the neptune-export tool will create a parameter group with a maximum value query timeout to avoid such situations.
If you're not using a cloned cluster, I would suggest starting there to see if using the cloned cluster eliminates your current issue. I'll re-open this issue for us to determine a better way to handle potential query errors.
So the issue (at least for me) might still surface in the same way, but it's happening after the 2min timeout window (we bumped it up using a cluster parameter just to be sure). It happens closer to 4 minutes in, but still has that nondescript error. I guess I could find out what's happening once you guys patch the library to properly decode and return unexpected errors? At least then I could better understand why this is happening and provide a better error message.
An error occurred while exporting RDF as Turtle. Elapsed time: 277 seconds
An error occurred while exporting rdf graph. Elapsed time: 282 seconds
java.lang.RuntimeException: org.eclipse.rdf4j.query.QueryEvaluationException: Unkown record type: 123
at com.amazonaws.services.neptune.rdf.NeptuneSparqlClient.executeTupleQuery(NeptuneSparqlClient.java:121)
at com.amazonaws.services.neptune.rdf.ExportRdfGraphJob.lambda$execute$0(ExportRdfGraphJob.java:34)
at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:41)
at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:34)
at com.amazonaws.services.neptune.rdf.ExportRdfGraphJob.execute(ExportRdfGraphJob.java:31)
at com.amazonaws.services.neptune.ExportRdfGraph.lambda$run$0(ExportRdfGraph.java:77)
at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:41)
at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:34)
at com.amazonaws.services.neptune.ExportRdfGraph.run(ExportRdfGraph.java:60)
at com.amazonaws.services.neptune.export.NeptuneExportRunner.run(NeptuneExportRunner.java:72)
at com.amazonaws.services.neptune.NeptuneExportCli.main(NeptuneExportCli.java:48)
Caused by: org.eclipse.rdf4j.query.QueryEvaluationException: Unkown record type: 123
at org.eclipse.rdf4j.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:59)
at com.amazonaws.services.neptune.rdf.NeptuneSparqlClient.executeTupleQuery(NeptuneSparqlClient.java:118)
... 10 more
Caused by: java.io.IOException: Unkown record type: 123
at org.eclipse.rdf4j.query.resultio.binary.BinaryQueryResultParser.parse(BinaryQueryResultParser.java:188)
at org.eclipse.rdf4j.query.resultio.AbstractTupleQueryResultParser.parseQueryResult(AbstractTupleQueryResultParser.java:48)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getTupleQueryResult(SPARQLProtocolSession.java:699)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendTupleQuery(SPARQLProtocolSession.java:369)
at org.eclipse.rdf4j.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:56)
... 11 more
An error occurred while exporting from Neptune: org.eclipse.rdf4j.query.QueryEvaluationException: Unkown record type: 123
The export tool is issuing multiple queries, so it may not be timing out on the initial requests. It could be timing out on a latter request. If you haven't tried running this using the --clone-cluster
parameter, I would suggest doing that. Or, setting your query timeout very high during the export. We do call this out in the README that a higher query timeout would likely be required during export operations as we're issuing queries that will fetch and return large result sets.
I wanted to jump in on this one and say that my observations match @triggan's earlier findings. I want to mention as well that sometimes when testing this I would instead get the following stack trace when a query timeout occurred:
java.lang.RuntimeException: org.eclipse.rdf4j.query.QueryEvaluationException: Malformed query result from server
at com.amazonaws.services.neptune.rdf.NeptuneSparqlClient.executeTupleQuery(NeptuneSparqlClient.java:121)
at com.amazonaws.services.neptune.rdf.ExportRdfGraphJob.lambda$execute$0(ExportRdfGraphJob.java:34)
at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:41)
at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:34)
at com.amazonaws.services.neptune.rdf.ExportRdfGraphJob.execute(ExportRdfGraphJob.java:31)
at com.amazonaws.services.neptune.ExportRdfGraph.lambda$run$0(ExportRdfGraph.java:77)
at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:41)
at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:34)
at com.amazonaws.services.neptune.ExportRdfGraph.run(ExportRdfGraph.java:60)
at com.amazonaws.services.neptune.export.NeptuneExportRunner.run(NeptuneExportRunner.java:72)
at com.amazonaws.services.neptune.ExportRdfIntegrationTest.testExportRdf(ExportRdfIntegrationTest.java:23)
at com.amazonaws.services.neptune.ExportRdfIntegrationTest.main(ExportRdfIntegrationTest.java:54)
Caused by: org.eclipse.rdf4j.query.QueryEvaluationException: Malformed query result from server
at org.eclipse.rdf4j.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:59)
at com.amazonaws.services.neptune.rdf.NeptuneSparqlClient.executeTupleQuery(NeptuneSparqlClient.java:118)
... 11 more
Caused by: org.eclipse.rdf4j.repository.RepositoryException: Malformed query result from server
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getTupleQueryResult(SPARQLProtocolSession.java:701)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendTupleQuery(SPARQLProtocolSession.java:369)
at org.eclipse.rdf4j.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:56)
... 12 more
Caused by: org.eclipse.rdf4j.query.resultio.QueryResultParseException: org.xml.sax.SAXParseException; lineNumber: 1317; columnNumber: 192; XML document structures must start and end within the same entity.
at org.eclipse.rdf4j.query.resultio.sparqlxml.AbstractSPARQLXMLParser.fatalError(AbstractSPARQLXMLParser.java:336)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.endEntity(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl.endEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.endEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:199)
at org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:180)
at org.eclipse.rdf4j.query.resultio.sparqlxml.AbstractSPARQLXMLParser.parseQueryResultInternal(AbstractSPARQLXMLParser.java:187)
at org.eclipse.rdf4j.query.resultio.sparqlxml.SPARQLResultsXMLParser.parseQueryResult(SPARQLResultsXMLParser.java:73)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getTupleQueryResult(SPARQLProtocolSession.java:699)
... 14 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 1317; columnNumber: 192; XML document structures must start and end within the same entity.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
... 37 more
An error occurred while exporting from Neptune: org.eclipse.rdf4j.query.QueryEvaluationException: Malformed query result from server
Just a quick update here, Neptune-Export has been migrated to a new repository and adopted a versioned release structure.
A fix has been merged in the new repository and a release is planned for the end of the month. Neptune-Export will now extract error messages from trailing headers and add them to the output. With this update a timeout exception will now look something like this:
Creating statement files
An error occurred while exporting RDF as Turtle. Elapsed time: 0 seconds
An error occurred while exporting rdf graph. Elapsed time: 1 seconds
java.lang.RuntimeException: X-Neptune-Status: 500 TimeLimitExceededException
X-Neptune-Detail: {"code":"TimeLimitExceededException","requestId":"dcc37392-6714-4e18-e301-efab45ec6ea2","detailedMessage":"Operation terminated (deadline exceeded)"}
at com.amazonaws.services.neptune.rdf.NeptuneSparqlClient.executeTupleQuery(NeptuneSparqlClient.java:125)
at com.amazonaws.services.neptune.rdf.ExportRdfGraphJob.lambda$execute$0(ExportRdfGraphJob.java:34)
at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:41)
at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:34)
at com.amazonaws.services.neptune.rdf.ExportRdfGraphJob.execute(ExportRdfGraphJob.java:31)
at com.amazonaws.services.neptune.ExportRdfGraph.lambda$run$0(ExportRdfGraph.java:77)
at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:41)
at com.amazonaws.services.neptune.util.Timer.timedActivity(Timer.java:34)
at com.amazonaws.services.neptune.ExportRdfGraph.run(ExportRdfGraph.java:60)
at com.amazonaws.services.neptune.export.NeptuneExportRunner.run(NeptuneExportRunner.java:72)
at com.amazonaws.services.neptune.NeptuneExportCli.main(NeptuneExportCli.java:48)
Caused by: org.eclipse.rdf4j.query.QueryEvaluationException: Unkown record type: 123
at org.eclipse.rdf4j.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:59)
at com.amazonaws.services.neptune.rdf.NeptuneSparqlClient.executeTupleQuery(NeptuneSparqlClient.java:121)
... 10 more
Caused by: java.io.IOException: Unkown record type: 123
at org.eclipse.rdf4j.query.resultio.binary.BinaryQueryResultParser.parse(BinaryQueryResultParser.java:188)
at org.eclipse.rdf4j.query.resultio.AbstractTupleQueryResultParser.parseQueryResult(AbstractTupleQueryResultParser.java:48)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getTupleQueryResult(SPARQLProtocolSession.java:699)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendTupleQuery(SPARQLProtocolSession.java:369)
at org.eclipse.rdf4j.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:56)
... 11 more
An error occurred while exporting from Neptune: X-Neptune-Status: 500 TimeLimitExceededException
X-Neptune-Detail: {"code":"TimeLimitExceededException","requestId":"dcc37392-6714-4e18-e301-efab45ec6ea2","detailedMessage":"Operation terminated (deadline exceeded)"}
This information from the trailers should make the cause of any server side failure much more visible to any export users.