hdt-java
hdt-java copied to clipboard
HDTCatTree to create an HDT with low resources using HDTCat
This pull request create a new small method to generate HDT, catTree
.
This method create small HDTs using the generateHDT method and HDTCat them to reduce memory usage or being able to create HDTs without having the memory to store it.
API Changes
It add 3 new methods in HDTManager and 3 to implement
public static HDT catTree(RDFFluxStop fluxStop, HDTSupplier supplier, String rdfFileName, String baseURI, RDFNotation rdfNotation, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
public static HDT catTree(RDFFluxStop fluxStop, HDTSupplier supplier, InputStream rdfStream, String baseURI, RDFNotation rdfNotation, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
public static HDT catTree(RDFFluxStop fluxStop, HDTSupplier supplier, Iterator<TripleString> iterator, String baseURI, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
protected abstract HDT doHDTCatTree(RDFFluxStop fluxStop, HDTSupplier supplier, String filename, String baseURI, RDFNotation rdfNotation, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
protected abstract HDT doHDTCatTree(RDFFluxStop fluxStop, HDTSupplier supplier, InputStream stream, String baseURI, RDFNotation rdfNotation, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
protected abstract HDT doHDTCatTree(RDFFluxStop fluxStop, HDTSupplier supplier, Iterator<TripleString> iterator, String baseURI, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
It also 2 new classes to specify how to build the HDT with HDTSupplier
and when to stop the RDF stream with RDFFluxStop
.
Both HDTSupplier and RDFFluxStop have methods to quickly create instances.
static RDFFluxStop noLimit();
static RDFFluxStop countLimit(long maxTriple);
static RDFFluxStop sizeLimit(long maxSize);
static HDTSupplier memory();
It's also possible to use multiple limit with the
RDFFluxStop and(RDFFluxStop other);
RDFFluxStop or(RDFFluxStop other);
methods.
Core changes
-
Implementation of HDTCatTree with tests.
-
Some fixes on the header part with HDTCat.
-
Remove of the System.out.println during HDTCat to use the ProgressListener
API Changes
Add of the -cattreelocation
and -cattree
options to the rdf2hdt
command to use HDTCatTree.