Wildbook
Wildbook copied to clipboard
Implementation of vector database
Feature description and context
This is the parent issue for the implementation of a vector database within Wildbook. Implementation includes storing embeddings generated during identification, archiving of embeddings, use of embeddings for matching, and migration of any connected legacy code.
Feature sign-off requirements
- evaluation of technology options complete [ticket], to include documentation of:
- demonstrable benefit of adopting new technology if current (i.e., OpenSearch) is not recommended
- upgrade / migration path
- are recommendations "nice to have" or necessary for a vector database MVP?
- if new technology is recommended, it should be added to deployment code and documentation [ticket]
- documentation of vector database configuration [ticket]
- management and tuning
- test plan and implemented tests [ticket]
- upgrade issues [parent ticket]
- migration of current code to vector
Notes from 7/21
- how is vector produced:
- created at same time as detection?
- later, based on annotation (from detection or manual)? if so: how is this annotation passed to ML?
- (maybe both options above)
- proposed: postgresql as permanent/truth source of vector data, opensearch index build from that (already is backed up etc)
- experiment with vectors as part of opensearch documents (fields) to be added to annotation index (versus separate vector index) [jon]