hdt-java icon indicating copy to clipboard operation
hdt-java copied to clipboard

HDTCatTree to create an HDT with low resources using HDTCat

Open ate47 opened this issue 1 year ago • 0 comments

This pull request create a new small method to generate HDT, catTree.

This method create small HDTs using the generateHDT method and HDTCat them to reduce memory usage or being able to create HDTs without having the memory to store it.

API Changes

It add 3 new methods in HDTManager and 3 to implement

public static HDT catTree(RDFFluxStop fluxStop, HDTSupplier supplier, String rdfFileName, String baseURI, RDFNotation rdfNotation, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
public static HDT catTree(RDFFluxStop fluxStop, HDTSupplier supplier, InputStream rdfStream, String baseURI, RDFNotation rdfNotation, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
public static HDT catTree(RDFFluxStop fluxStop, HDTSupplier supplier, Iterator<TripleString> iterator, String baseURI, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;

protected abstract HDT doHDTCatTree(RDFFluxStop fluxStop, HDTSupplier supplier, String filename, String baseURI, RDFNotation rdfNotation, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
protected abstract HDT doHDTCatTree(RDFFluxStop fluxStop, HDTSupplier supplier, InputStream stream, String baseURI, RDFNotation rdfNotation, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
protected abstract HDT doHDTCatTree(RDFFluxStop fluxStop, HDTSupplier supplier, Iterator<TripleString> iterator, String baseURI, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;

It also 2 new classes to specify how to build the HDT with HDTSupplier and when to stop the RDF stream with RDFFluxStop.

Both HDTSupplier and RDFFluxStop have methods to quickly create instances.

static RDFFluxStop noLimit();
static RDFFluxStop countLimit(long maxTriple);
static RDFFluxStop sizeLimit(long maxSize);
static HDTSupplier memory();

It's also possible to use multiple limit with the

RDFFluxStop and(RDFFluxStop other);
RDFFluxStop or(RDFFluxStop other);

methods.

Core changes

  • Implementation of HDTCatTree with tests.

  • Some fixes on the header part with HDTCat.

  • Remove of the System.out.println during HDTCat to use the ProgressListener

API Changes

Add of the -cattreelocation and -cattree options to the rdf2hdt command to use HDTCatTree.

ate47 avatar Sep 16 '22 14:09 ate47