grafter icon indicating copy to clipboard operation
grafter copied to clipboard

Add transducer/reducer APIs for eager RDF processing

Open RickMoynihan opened this issue 7 years ago • 0 comments

This is a bit of a mega ticket which we can break down into separate tickets as and when we need them.

Generally we want to add eager processing capabilities in to grafter that retain some of the composition and streaming benefits of lazy-seqs, but without the GC pressure, and resource life cycle issues.

We already have some undocumented and limited support for transducers/CollReduce, though we may need to revise or add more reducing-functions...

In particular we'd like to:

  • [ ] To support CollReduce when reading RDF sources. In particular:
    • [ ] When reading triples/quads via statements. I suspect there's some mileage in improving things by making RDFParser extend CollReduce, that we can expose the intention to parse a specific thing as a reified first class entity, and trigger the consumption elsewhere. Also before conducting this work it might be worth considering upgrading to RDF4j #95.
    • [ ] On SPARQL queries, allowing you to build a query and consume the results with things like (into [] xform query) and have all resources etc cleaned up.
  • [ ] To support reducing functions for adding/writing data to RDF destinations
  • [ ] Support (into sparql-repo xform quads) (IIRC this might already be supported, but we should check it works in all the contexts we require.

RickMoynihan avatar Jul 13 '17 11:07 RickMoynihan