Implementation of vector database

Open vkirkl opened this issue 6 months ago • 0 comments

Feature description and context

This is the parent issue for the implementation of a vector database within Wildbook. Implementation includes storing embeddings generated during identification, archiving of embeddings, use of embeddings for matching, and migration of any connected legacy code.

Feature sign-off requirements

evaluation of technology options complete [ticket], to include documentation of:
- demonstrable benefit of adopting new technology if current (i.e., OpenSearch) is not recommended
- upgrade / migration path
- are recommendations "nice to have" or necessary for a vector database MVP?
if new technology is recommended, it should be added to deployment code and documentation [ticket]
documentation of vector database configuration [ticket]
- management and tuning
test plan and implemented tests [ticket]
upgrade issues [parent ticket]
- migration of current code to vector

Notes from 7/21

how is vector produced:
- created at same time as detection?
- later, based on annotation (from detection or manual)? if so: how is this annotation passed to ML?
- (maybe both options above)
proposed: postgresql as permanent/truth source of vector data, opensearch index build from that (already is backed up etc)
experiment with vectors as part of opensearch documents (fields) to be added to annotation index (versus separate vector index) [jon]

Jul 01 '25 17:07 vkirkl