Wildbook icon indicating copy to clipboard operation
Wildbook copied to clipboard

Implementation of vector database

Open vkirkl opened this issue 6 months ago • 0 comments

Feature description and context

This is the parent issue for the implementation of a vector database within Wildbook. Implementation includes storing embeddings generated during identification, archiving of embeddings, use of embeddings for matching, and migration of any connected legacy code.

Feature sign-off requirements

  • evaluation of technology options complete [ticket], to include documentation of:
    • demonstrable benefit of adopting new technology if current (i.e., OpenSearch) is not recommended
    • upgrade / migration path
    • are recommendations "nice to have" or necessary for a vector database MVP?
  • if new technology is recommended, it should be added to deployment code and documentation [ticket]
  • documentation of vector database configuration [ticket]
    • management and tuning
  • test plan and implemented tests [ticket]
  • upgrade issues [parent ticket]
    • migration of current code to vector

Notes from 7/21

  • how is vector produced:
    • created at same time as detection?
    • later, based on annotation (from detection or manual)? if so: how is this annotation passed to ML?
    • (maybe both options above)
  • proposed: postgresql as permanent/truth source of vector data, opensearch index build from that (already is backed up etc)
  • experiment with vectors as part of opensearch documents (fields) to be added to annotation index (versus separate vector index) [jon]

vkirkl avatar Jul 01 '25 17:07 vkirkl