Precompute expanded relations as a temporary table
I have a query that correctly performs an ILI-expanded synset relation traversal, but it is rather complicated and wouldn't lend itself well to a recursive query (see #38). This means it only works one hop at a time and doesn't grant much performance bonus, although it does make the Python code simpler (while making the SQL more complex):
WITH
rt(rowid, type) AS
(SELECT rowid, type FROM relation_types WHERE type IN (?, ...)),
lex(rowid) AS (VALUES (?), ...),
intra(rowid, source_rowid, target_rowid, type) AS
(SELECT rel.rowid, source_rowid, target_rowid, rt.type
FROM synset_relations AS rel
JOIN rt ON rel.type_rowid = rt.rowid
WHERE rel.lexicon_rowid IN lex
AND rel.source_rowid = ?),
inter(rowid, source_rowid, target_rowid, type) AS
(SELECT rel.rowid, source_rowid, target_rowid, rt.type
FROM synset_relations AS rel
JOIN rt ON rel.type_rowid = rt.rowid
WHERE rel.lexicon_rowid IN (?, ...)
AND rel.source_rowid IN
(SELECT ss1.rowid
FROM synsets AS ss1
JOIN ilis ON ilis.rowid = ss1.ili_rowid
WHERE ilis.id = ?
AND ss1.rowid != ?))
SELECT intra.type, intra.rowid, tgt.id, tgt.pos,
ilis.id, tgt.lexicon_rowid, tgt.rowid
FROM intra
JOIN synsets AS tgt ON intra.target_rowid = tgt.rowid
LEFT JOIN ilis ON ilis.rowid = tgt.ili_rowid
WHERE tgt.lexicon_rowid IN lex
UNION
SELECT inter.type, inter.rowid,
CASE WHEN ss.lexicon_rowid IN lex THEN ss.id ELSE null END,
ss.pos, ilis.id,
CASE WHEN ss.lexicon_rowid IN lex THEN ss.lexicon_rowid ELSE null END,
CASE WHEN ss.lexicon_rowid IN lex THEN ss.rowid ELSE null END
FROM inter
JOIN synsets AS _tgt ON inter.target_rowid = _tgt.rowid
JOIN synsets AS ss ON _tgt.ili_rowid = ss.ili_rowid
JOIN ilis ON ilis.rowid = ss.ili_rowid
WHERE _tgt.ili_rowid IS NOT NULL;
Since all these joins and checks occur each time a relation is traversed, it's a bit wasteful. We could precompute them into a temporary table with something like this:
CREATE TEMPORARY TABLE IF NOT EXISTS virtual_synset_relations (
wnid INTEGER NOT NULL,
source_rowid INTEGER NOT NULL,
target_rowid INTEGER,
target_ili INTEGER,
synset_relation_rowid INTEGER NOT NULL
);
CREATE INDEX IF NOT EXISTS temp.virtual_synset_relations_source
ON virtual_synset_relations(source_rowid);
WITH
lex(rowid) AS (VALUES (?), ...),
rel(src_ili_rowid, tgt_ili_rowid, rowid) AS
(SELECT _src.ili_rowid, _tgt.ili_rowid, _rel.rowid
FROM synset_relations AS _rel
JOIN synsets AS _src ON _src.rowid = _rel.source_rowid
JOIN synsets AS _tgt ON _tgt.rowid = _rel.target_rowid
WHERE _rel.lexicon_rowid IN (?, ...))
INSERT INTO virtual_synset_relations
SELECT (SELECT ifnull(max(wnid), 0) + 1 FROM virtual_synset_relations),
src.rowid, tgt.rowid, tgt.ili_rowid, rel.rowid
FROM rel
JOIN synsets AS src ON src.ili_rowid = rel.src_ili_rowid
JOIN synsets AS tgt ON tgt.ili_rowid = rel.tgt_ili_rowid
WHERE src.lexicon_rowid IN lex
AND tgt.lexicon_rowid IN lex;
This creates a temporary table that, for a wordnet (defined as a set of target and expand lexicons), lists all relations that would be available via expansion, but not those in the original lexicons (unless those lexicons are also in the expand set). Creating this table for the Chinese or French wordnets, expanded with PWN 3.0, takes less than 2 seconds each on my machine. It might be less if the table were created in memory only.
I have not yet performed any queries with this to estimate the performance of lookup (e.g., if more indexes are required), or estimated the memory requirements. If it doesn't take much memory, it might make sense to also include non-expanded relations so lookup only needs to access one table of relations.
Forgot to mention that this would be computed when a user creates a Wordnet object and not permanently stored in the database. This is because the temporary table becomes invalid when the data changes (a lexicon is added, removed, or modified), and also because it wouldn't be efficient to precompute all possible permutations of target + expand lexicons. With LMF 1.1 it's possible to anticipate those with the <Requires> element, so those could be done early, but it's not guaranteed the user wants to use that set, and if it doesn't take long to create a temporary table there's little to gain by doing it early.
The 2-seconds to build climbs quite a bit when many lexicons are loaded. When all of OMW is added, it can be over 10 seconds. Using the SQLite pragma for in-memory temporary tables didn't help. This probably shouldn't be done by default when creating a Wordnet object.