sparql.anything icon indicating copy to clipboard operation
sparql.anything copied to clipboard

ondisk property and BaseFacadeXGraphBuilder.previousTDB2Path

Open luigi-asprino opened this issue 2 years ago • 3 comments

I would like to discuss a few design choices related to ondisk feature. This issue relates to the PR #185 issues #280 #295 #296 and #298 (at least).

First, why do we need BaseFacadeXGraphBuilder.previousTDB2Path field? The BaseFacadeXGraphBuilder is instantiated once for each query execution. Therefore, in the case that field is static the previous path might point to a TDB used for a previous query, and, we don't want that the execution of a query affects the processing of the following ones. If not static, the previous path is always "". Maybe we don't need it, or am I missing something here?

BaseFacadeXGraphBuilder.getDatasetGraph() is supposed to create the DatasetGraph where the triples will be stored. In the case of ondisk property set, the DatasetGraph has to be a TDB2, an in-memory implementation. otherwise. We may discuss what to do in the case that the TDB already exists (that is, deleting the directory or not, or let the user decide - e.g. with ondisk.reuse), but I would keep the method very simple.

The other thing that I find strange is the ondisk property itself. The current behaviour is:

  • if the path string passed as a parameter is a directory, then, create a temporary directory inside ondisk path for the TDB.
  • otherwise, create a temporary directory in \tmp Well, if we remove the previousTDB2Path field, there is no way to tell the engine to use an existing TDB located at a certain path.

I'd suggest the following.

  • ondisk becomes a true/false option. false is the default value. true, means tells the engine to create a TDB in a temporary directory in /tmp or (preferably) System.getProperty("java.io.tmpdir")
  • introduce ondisk.path to tell the engine the path of the TDB (this will be created in case it doesn't already exist).
  • introduce ondisk.reuse to tell the engine to wipe the directory before creating the new TDB. Default value = false

WDYT?

luigi-asprino avatar Aug 31 '22 21:08 luigi-asprino

First, why do we need BaseFacadeXGraphBuilder.previousTDB2Path field?

if the user specifies the ondisk.reuse option we need to know where that TDB2 (used in a previous query) lived on the disk so that we can reuse it. we aren't reusing the whole /tmp directory, we are reusing the specific TDB2 e.g. /tmp/3477575753 and we need to store that directory name somewhere.

i think your suggestions are good. :+1:

justin2004 avatar Sep 01 '22 13:09 justin2004

First, why do we need BaseFacadeXGraphBuilder.previousTDB2Path field?

if the user specifies the ondisk.reuse option we need to know where that TDB2 (used in a previous query) lived on the disk so that we can reuse it. we aren't reusing the whole /tmp directory, we are reusing the specific TDB2 e.g. /tmp/3477575753 and we need to store that directory name somewhere.

i think your suggestions are good. 👍

I agree with removing the static fields and ask the users to point at the TDB2 folder to be reused. Consider also that static fields won't be a solution for the CLI. Also, I would avoid generating temporary folders and ask the user where they want to store the temporary database and if they want it to be deleted at the end of the process or not.

enridaga avatar Sep 01 '22 14:09 enridaga

Ok, so we all agree to remove the static fields. Maybe, we can also avoid ondisk.path and use ondisk for passing the path of the TDB explicitly. And also avoid the generation of the temporary path for the TDB2 and leave the user to point at the TDB2 folder explicitly, so there is no need to store the previous TDB path. Great! I proceed with the implementation of these changes.

luigi-asprino avatar Sep 01 '22 14:09 luigi-asprino

With f1355c0 the creation of the DatasetGraph is as follows. If the ondisk property is set, a TDB is created at the location passed as a value (in the case that ondisk.reuse is set as false, the location is deleted before creating the TBD). By default, ondisk.reuse is true (I think that false would cause accidental deletion of folders).

luigi-asprino avatar Sep 04 '22 16:09 luigi-asprino

I close this issue as it is completed via f1355c0

luigi-asprino avatar Sep 04 '22 16:09 luigi-asprino