chai icon indicating copy to clipboard operation
chai copied to clipboard

bleve to get full text search and facets

Open gedw99 opened this issue 3 years ago • 4 comments

Proposal

Add bleve support ( https://github.com/blevesearch )

Motivation

Provides FTS like sqllite has and other DB's, to allow searching over documents. https://sqlite.org/fts3.html

Provides facet based data analysis. A good demo of that concept is here in the video. https://datasette.io/ In the demo of datasette, every column can be faceted: https://global-power-plants.datasettes.com/global-power-plants/global-power-plants

  • this is a very powerful construct for developers and users

Design

For examle with SQLite it is a special Table in order to do FTS. Note that facets is a different and would need a different DSL.

For example, if each of the 517430 documents in the "Enron E-Mail Dataset" is inserted into both an FTS table and an ordinary SQLite table created using the following SQL script:

CREATE VIRTUAL TABLE enrondata1 USING fts3(content TEXT);     /* FTS3 table */
CREATE TABLE enrondata2(content TEXT);                        /* Ordinary table */

Then either of the two queries below may be executed to find the number of documents in the database that contain the word "linux" (351). Using one desktop PC hardware configuration, the query on the FTS3 table returns in approximately 0.03 seconds, versus 22.5 for querying the ordinary table.

SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux';  /* 0.03 seconds */
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* 22.5 seconds */

Prior work:

https://github.com/mosuka/blast https://github.com/mosuka/blast#search-documents

$ ./bin/blast search '
{
  "search_request": {
    "query": {
      "query": "+_all:search"
    },
    "size": 10,
    "from": 0,
    "fields": [
      "*"
    ],
    "sort": [
      "-_score"
    ]
  }
}
' | jq .

gedw99 avatar Mar 20 '21 10:03 gedw99

A good implementation might be store fts properties on special tables and map them to genjidb table. While this could be the simplest implementation, it will limit queries to only fts, and won't be possible to pass other WHERE query.

tomasweigenast avatar May 26 '21 08:05 tomasweigenast

Thanks or the suggestion.

A simple example of bleve is here along with a simple gui. It’s a good baseline.

https://github.com/blugelabs/beer-search

It creates a File System store.

there is no reason why genji store and this blue store can operate separately.

In terms of linking between the two , the meta data of the bleve store could be stored in genji as you suggest.

In your middle tier you would then update bleve and genji with separate calls.

In terms of GUI, a typical use as you see in beer search example is to use faceted search which ones not map well to data base patterns. So a facet search would tend to be on a different page from a page that needs the genji dB.

Facetted search could be used to do high level cross object searches and then once you identity in the result ms objects your interested in your gui then starts to use the genji dB.

So the genji table name and document / row ID would need to be saved in the bleve store which just requires some hooks I suspect. When data is updated in genji , you then need to tell bleve of the ID and data so that it can reinfect the bleve store.

That would get them at least working together .

later I can imaging a tighter integration but I am almost certain the genji maintainers would not be up for this. Instead a driver for bleve could be build that used genji as the store perhaps.

right now I am working on getting genji working with indexeddb so that it’s possible to build golang gui’s by cross compiling to wasm with genji embedded just like how you can use genji to build golang mobile and desktop apps.

then it would make sense to look at a bleve driver that used genji cause then we would be able to build for wasm, desktop and mobile and have a genji dB embedded .

the gui is gioui . It’s pretty cool for a golang dev to be able to use golang and only golang to build their gui and sever . I am currently work on this and will put up a full working demo on GitHub soon.

https://github.com/hack-pad/hackpadfs

uses

https://github.com/hack-pad/go-indexeddb

contact me if your curious.

or let me know what you think in general..

gedw99 avatar Aug 10 '21 12:08 gedw99

If your curious I raised an issue about getting Bluge running in a browser as a first step :

https://github.com/blugelabs/bluge/issues/72

gedw99 avatar Aug 10 '21 12:08 gedw99

Hey, it is active? So sorry I did not answer anymore, I was busy working, but now I can help with anything you need. @gedw99

tomasweigenast avatar Nov 23 '22 11:11 tomasweigenast