qlever icon indicating copy to clipboard operation
qlever copied to clipboard

Feature Request: Support for Offline Triplestore Dump to RDF Formats

Open arcangelo7 opened this issue 11 months ago • 1 comments

Hello QLever Team,

I've been exploring the capabilities of QLever and its control script, qlever-control, for managing SPARQL queries and datasets. To the best of my knowledge, I couldn't find a feature that allows for dumping the entire triplestore to an RDF file. This functionality is crucial for handling very large datasets efficiently.

For large triplestores, the approach of using SPARQL queries with OFFSET and LIMIT to paginate through results for dumping data becomes impractical due to time constraints. Similarly, attempting a single massive CONSTRUCT query to dump the entire dataset is not feasible due to memory limitations.

In comparison, Blazegraph offers a solution for this issue with its com.bigdata.rdf.sail.ExportKB class, enabling offline dumps of the triplestore in various formats such as N-Quads, JSON-LD, etc. This feature significantly simplifies managing and archiving large datasets.

My use case involves working with OpenCitations Meta, which comprises 4,236,287,432 triples for data and an additional 5,540,033,781 triples for provenance. Being able to dump our data from the triplestore into RDF formats is essential for our operations, and a similar feature in QLever would greatly benefit us and likely many others in the community.

Could you consider adding such a feature to QLever or qlever-control? An offline dump feature for the triplestore that supports multiple RDF formats would be a tremendous asset, especially for those of us dealing with extensive datasets.

Thank you for considering this request. Your efforts in developing and maintaining QLever are greatly appreciated.

arcangelo7 avatar Feb 29 '24 16:02 arcangelo7

@arcangelo7 Two questions:

  1. Can you briefly explain what is the advantage of dumping the complete dataset from a SPARQL endpoint vs. just downloading the dataset based on which the SPARQL endpoint was constructed.

  2. What exactly is impractical about multiple queries involving OFFSET and LIMIT? QLever does not support OFFSET for ?s ?p ?o queries yet, but that would be a relatively easy fix.

hannahbast avatar Feb 29 '24 16:02 hannahbast