Explore options for fuzzy-match and search suggestions
The built-in MapAdapter and external databroker.mongo_normalized adapter supports the FullText query. We will add support for FullText in the built-in SQL-backed Catalog Adapter in #456, #457 for SQLite and PostgreSQL respectively.
Next, we should consider fuzzy match and search suggestions. This has often been done with the ELK stack, but that is a heavy stack to take on for the sake of just one of its features. What are our options?
@Kezzsim highlighted the project typesense, which is exactly targeted at serving this use case without taking on the weight of ELK.
Also, I believe there is some functionality in this space available in SQLite and PostgreSQL. While not at the level of ELK, it would be good to understand precisely how far we can get with the tech stack we already have, and what its limitations are.
In discussions with @Kezzsim, we are going ahead with TypeSense, as an optional add-on in the same way that Prometheus is an optional add-on.
I think that this will involve:
- Adding a new optional argument
typesenseto the Catalog constructors, which takesNone(default---no typense) or a config dict like
{
'api_key': 'Hu52dwsas2AdxdE',
'nodes': [{
'host': 'localhost',
'port': '8108',
'protocol': 'http'
}],
'connection_timeout_seconds': 2
}
https://github.com/bluesky/tiled/blob/c76d1b3bf0468df8497568dfd9d6580207479a40/tiled/catalog/adapter.py#L1135-L1169
Tiled config like:
trees:
- tree: catalog
args:
uri: postgresql+asyncpg://...
typesense:
api_key: $TYPESENSE_API_KEY
nodes:
- host: localhost
port: 8108
protocol: http
connection_timeout_seconds: 2
will just work, with no code changes to the config parser.
- Passing that config dig into
Context.__init__and creating an instance of atypesense.Clientheld asself.typesense_clienton theContext.
https://github.com/bluesky/tiled/blob/c76d1b3bf0468df8497568dfd9d6580207479a40/tiled/catalog/adapter.py#L111-L119
-
Also in
Context.__init__, registering[after_insert] (https://docs.sqlalchemy.org/en/20/orm/events.html#sqlalchemy.orm.MapperEvents.after_insert) andafter_updateSQLAlchemy events that make the relevant calls fromself.typesense_client. (I remain not entirely clear what these hooks give you access to, but the docs look promising.) -
Adding a new module
tiled.commandline._typesenseand updatingtiled.commandline.mainto add atiled typsensesubcommand to the CLI. I imagine we will need:
tiled typesense init TYPESENSE_URL [ANOTHER_TYPESENSE_URL] # define schemas
tiled typesense rebuild TYPESENSE_URL [ANOTHER_TYPESENSE_URL] # drop data (if any) and rebuild
The utility urllib.parse.urlparse can be used to get from a CLI-friendly string like http://localhost:8108?api_key=Hu52dwsas2AdxdE into the structure:
{
'api_key': '',
'nodes': [{
'host': 'localhost',
'port': '8108',
'protocol': 'http'
}],
'connection_timeout_seconds': 2
}
All of above is up for a rethink, just meant as a quick sketch to highlight the relevant sections of the Tiled code that I can see will need to be touched.
From discussion on 20 Feb:
- The TypeSense ingestion (both at initialization and via the trigger) will ignore any nodes that do not meet some list of approved "specs" that TypeSense knows what to do with.
- There will be additional configuration, passed to tiled, along these lines:
typesense_ingestion:
- spec: BlueskyRun
fields:
- name: detectors # field name in TypeSense
path: "start.detectors" # path into Tiled JSON metadata
# Also type?
- spec: SomeOtherThing
...
https://github.com/bluesky/event-model/blob/main/event_model/schemas/run_start.json
# config.yml
authentication:
# The default is false. Set to true to enable any HTTP client that can
# connect to _read_. An API key is still required to write.
allow_anonymous_access: false
single_user_api_key: "secret" # for dev
trees:
- path: /
tree: catalog
args:
uri: "sqlite+aiosqlite:///:memory:"
# or, uri: "sqlite+aiosqlite:////catalog.db"
# or, "postgresql+asyncpg://..."
writable_storage: "data/"
init_if_not_exists: true
typesense_client:
schema:
connection_info:
$ tiled serve config config.yml