rdf4j icon indicating copy to clipboard operation
rdf4j copied to clipboard

Repository export operation does not provide any valid .trig

Open MRCO-DURON opened this issue 1 year ago • 8 comments

Current Behavior

When I do: curl "http://localhost:8080/rdf4j-workbench/repositories/repositoryName/export?Accept=application%2Ftrig" --compressed -o ./repositoryName.trig

I get only 15k files, while my repo is 145G. And my previous exports used to be 3G or more.

Have also tried using console.sh: /home/ubuntu/eclipse-rdf4j-4.3.2/bin/console.sh and I get java.lang.OutOfMemoryError exceptions.

Expected Behavior

Get valid file exports by either using console.sh or rdf4j API.

Steps To Reproduce

curl "http://localhost:8080/rdf4j-workbench/repositories/repositoryName/export?Accept=application%2Ftrig" --compressed -o ./repositoryName.trig

/home/ubuntu/eclipse-rdf4j-4.3.2/bin/console.sh

Version

4.3.2

Are you interested in contributing a solution yourself?

None

Anything else?

No response

MRCO-DURON avatar Jul 29 '24 18:07 MRCO-DURON

Did this used to work before? Do you know which version it worked on before?

hmottestad avatar Jul 29 '24 21:07 hmottestad

Btw. Trig isn't a particularly good format for exporting a lot of data since the trig writer needs to know a lot about your data to format it correctly.

Have you tried with NQUADS? That should hopefully be a fully streaming data format.

hmottestad avatar Jul 29 '24 21:07 hmottestad

Sadly the last working version is something I din't know about. However I have 3 other lower environments, running same version without issues.

The only difference is the size of the repositories.

Your saying I can export/convert my current repository(.ttl) as NQUADS?

MRCO-DURON avatar Aug 01 '24 16:08 MRCO-DURON

Here is also an error I get when using eclipse-rdf4j-console console:

`root@ip-172-31-38-149:~# bash /home/ubuntu/eclipse-rdf4j-4.3.2/bin/console.sh 15:53:33.811 [main] DEBUG org.eclipse.rdf4j.common.platform.PlatformFactory - os.name = linux 15:53:33.814 [main] DEBUG org.eclipse.rdf4j.common.platform.PlatformFactory - Detected Posix platform Connected to default data directory RDF4J Console 4.3.2 Working dir: /home/ubuntu/eclipse-rdf4j-2.5.1/bin Type 'help' for help.

connect http://127.0.0.1:8080/rdf4j-server Disconnecting from default data directory Connected to http://127.0.0.1:8080/rdf4j-server open reponame Opened repository 'reponame' muchamiel> export /mnt/test.trig Exception in thread "main" org.eclipse.rdf4j.repository.RepositoryException:

HTTP Status 500 – Internal Server Error

HTTP Status 500 – Internal Server Error


Type Exception Report

Message Handler processing failed; nested exception is java.lang.OutOfMemoryError

Description The server encountered an unexpected condition that prevented it from fulfilling the request.

Exception

org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1094)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:964)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
com.github.ziplet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:263)

Root Cause

java.lang.OutOfMemoryError
java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125)
java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119)
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
java.base/java.io.DataOutputStream.write(DataOutputStream.java:107)
java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:108)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.writeString(BinaryRDFWriter.java:346)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.writeLiteral(BinaryRDFWriter.java:322)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.writeValue(BinaryRDFWriter.java:293)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.assignId(BinaryRDFWriter.java:254)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.incValueFreq(BinaryRDFWriter.java:238)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.consumeStatement(BinaryRDFWriter.java:198)
org.eclipse.rdf4j.rio.helpers.AbstractRDFWriter.handleStatement(AbstractRDFWriter.java:109)
org.eclipse.rdf4j.repository.sail.SailRepositoryConnection.exportStatements(SailRepositoryConnection.java:382)
org.eclipse.rdf4j.http.server.repository.statements.ExportStatementsView.render(ExportStatementsView.java:95)
org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1405)
org.springframework.web.servlet.DispatcherServlet.processDispatchResult(DispatcherServlet.java:1149)
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1088)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:964)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
com.github.ziplet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:263)

Note The full stack trace of the root cause is available in the server logs.


Apache Tomcat/8.5.39 (Ubuntu)

at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.execute(SPARQLProtocolSession.java:1095) at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.executeOK(SPARQLProtocolSession.java:1029) at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendGraphQueryViaHttp(SPARQLProtocolSession.java:945) at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getRDF(SPARQLProtocolSession.java:876) at org.eclipse.rdf4j.http.client.RDF4JProtocolSession.getStatements(RDF4JProtocolSession.java:618) at org.eclipse.rdf4j.repository.http.HTTPRepositoryConnection.exportStatements(HTTPRepositoryConnection.java:274) at org.eclipse.rdf4j.repository.base.AbstractRepositoryConnection.export(AbstractRepositoryConnection.java:189) at org.eclipse.rdf4j.console.command.Export.export(Export.java:140) at org.eclipse.rdf4j.console.command.Export.execute(Export.java:94) at org.eclipse.rdf4j.console.Console.executeCommand(Console.java:379) at org.eclipse.rdf4j.console.Console.start(Console.java:336)`

MRCO-DURON avatar Aug 01 '24 16:08 MRCO-DURON

Glad to know it's not a regression at least.

Any chance you can confirm that this is still an issue on RDF4J 5.0.1?

Other than that it looks like there is something that should be streaming the output but is actually writing it to a byte array output stream instead.

hmottestad avatar Aug 01 '24 20:08 hmottestad

I updated the files for it and it happens with 5.0.1 too. Same bahavior.

Here is my config.ttl:

`cat /var/lib/tomcat8/.RDF4J/server/repositories/myRepoName/config.ttl @prefix ns: http://www.openrdf.org/config/sail/native# . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix rep: http://www.openrdf.org/config/repository# . @prefix sail: http://www.openrdf.org/config/sail# . @prefix sb: http://www.openrdf.org/config/sail/base# . @prefix sr: http://www.openrdf.org/config/repository/sail# . @prefix xsd: http://www.w3.org/2001/XMLSchema# .

<#MyRepoName> a rep:Repository; rep:repositoryID "myRepoName"; rep:repositoryImpl [ rep:repositoryType "openrdf:SailRepository"; sr:sailImpl [ sail:sailType "openrdf:NativeStore"; sb:evaluationStrategyFactory "org.eclipse.rdf4j.query.algebra.evaluation.impl.StrictEvaluationStrategyFactory"; ns:tripleIndexes "spoc,posc" ] ]; rdfs:label "Native store" .`

MRCO-DURON avatar Aug 05 '24 15:08 MRCO-DURON

Thanks for checking. And just to be sure. Is this also the case when using NQUADS?

hmottestad avatar Aug 05 '24 17:08 hmottestad

I tried exporting the current repo as .nq. But that did not work. Is there any process for this? trig to nq?

MRCO-DURON avatar Aug 05 '24 19:08 MRCO-DURON