sycamore icon indicating copy to clipboard operation
sycamore copied to clipboard

Set Up Embeddings in Default Index Settings

Open alexaryn opened this issue 2 years ago • 1 comments

Is your feature request related to a problem? Please describe. The out-of-the-box experience for Sycamore doesn't handle embeddings well. Considering that this is the main use-case, the default index settings should provide for a KNN index of embeddings.

Describe the solution you'd like Default index settings in sycamore.writers.opensearch should be set up for embedding-based indexing and retrieval. Reasonable names should be used consistently for text, embeddings, title, author, etc.

Describe alternatives you've considered Pass in index_settings to docset.write.opensearch(). These will be 90% copy-pasta, but have the possibility to diverge and cause problems.

Additional context We should also revisit the integration tests to see how/if they can be simplified via defaults.

alexaryn avatar Oct 02 '23 20:10 alexaryn

It appears that it's possible to specify index_settings at the time the index is explicitly created. It also appears possible to have the index created implicitly by simply ingesting a document. We should see if it matters which way it's done.

alexaryn avatar Oct 02 '23 20:10 alexaryn