tntsearch
tntsearch copied to clipboard
Build an Index Manually Without a Datastore (no initial seeding of index)
From my comment on issue #118...
My data, as it exists in the DB is not in a format ready to be indexed. So I was hoping to start with an empty index, and then just add records as the publishing process creates them. This means I need to create an index, without connecting to a database. Using only the manual insertion ($index->insert(...)
).
Doing the following:
$tnt = new TNTSearch;
$tnt->loadConfig([
'storage' => __DIR__.'/tntsearch/indexes/'
]);
$indexer = $tnt->createIndex('test.index');
$index->insert(['id' => '1', 'content' => 'some awesome searchable content']); // I want to ONLY do this.
Gives an error because the db driver is not specified in the config. If I put some dummy mysql settings in there, it complains because it wants to connect to the db.
I have already created an empty text.index
file in the storage path.
How can I use ONLY $index->insert(...) to add my data, and skip connecting to the db? Is there a 'none' option for driver?
Well I may have come up with a workaround:
Use the filesystem as your driver, and have it look in an empty dir for the initial seeding of the db.
// TNT Search
$tnt = new TNTSearch;
$tnt->loadConfig([
"driver" => 'filesystem',
"location" => __DIR__.'/tntsearch/dummysource/',
"extension" => "txt",
'storage' => __DIR__.'/tntsearch/indexes/'
]);
$indexer = $tnt->createIndex('test.index');
$indexer->run();
$indexer->insert(['id' => '1', 'content' => 'new awesome article about php']);
$indexer->insert(['id' => '2', 'content' => 'another article about php']);
$indexer->insert(['id' => '3', 'content' => 'read this one because it is cool.']);
$indexer->insert(['id' => '4', 'content' => 'some stuff about interesting things']);
The index is first created from an empty dir so it has zero records. Then I've manually inserted 4 records.
Then query the index as normal.
$tnt->selectIndex('test.index');
$results = $tnt->search('article', 12);
So, this might work for now... but wondering if there is a better way to do this to skip the initial delay as it attempts to read the empty source directory?