PheKnowLator icon indicating copy to clipboard operation
PheKnowLator copied to clipboard

Other: Also use persistent RDFlib store for output graphs

Open zmaas opened this issue 3 years ago • 1 comments

Once a graph has been built, it may be useful to also import the resulting .owl file into an RDFlib persistent store. Use of a persistent store allows for the graph to be accessed using RDFlib without having to import the entire structure into memory, which may be advantageous when working with large graphs. Below is a sample implementation that uses the Berkeley Database as a persistent backend. RDFlib has built-in support for this particular backend. Note that Berkeley DB was formerly developed by Sleepycat Software, hence the use of "Sleepycat" as the backend name when creating the Graph object.

import rdflib
# The persistent store requires an identifier
graph_id = rdflib.URIRef(identifier)
# Open the graph with the "Sleepycat" Berkeley DB Backend
graph = rdflib.Graph("Sleepycat", identifier=graph_id)
# Open the graph and create it if it doesn't exist
graph.open(uri, create=True)
# Parse the graph at 'graph_path', typically XML formatted
# This could take many hours if the graph is large
graph.parse(graph_path)
# Close the graph to free resources. Mostly unneccessary due
# to the small overhead of the on-disk store
graph.close()

Alternatively, the following code wraps the above functionality in a context manager, allowing the graph to be managed inside of a with block for convenience:

from contextlib import contextmanager
import rdflib


@contextmanager
def open_persistent_graph(uri, identifier, graph_path=None):
    """Provides a context manager for working with an OWL graph while also
    automatically closing it afterward. URI is the location of the
    graph store directory and IDENTIFIER is the name of the graph
    within that store. Optional argument GRAPH_PATH specifies an
    appropriately formatted RDF file to import when opening the graph.

    """
    try:
        # Only force create if a path is provided
        create_graph = bool(graph_path)
        # Open and load the on-disk store
        graph_id = rdflib.URIRef(identifier)
        graph = rdflib.Graph("Sleepycat", identifier=graph_id)
        graph.open(uri, create=create_graph)
        # Parse the file at GRAPH_PATH if set
        if graph_path:
            graph.parse(graph_path)
        yield graph
    finally:
        graph.close()

zmaas avatar Oct 09 '20 21:10 zmaas

Thanks so much @zmaas! This is great. I will plan to leave this issue active until we can address it during the rebuild next month. Assuming it's OK with you, I will circle back to you when we are in the re-implementation stage?

callahantiff avatar Oct 11 '20 19:10 callahantiff