core
core copied to clipboard
feat: vectorize
Closes #174, Related #173
Adds support for a single Vectorize database.
For docs: Vectorize through Cloudflare bindings accessed via const vectorize = hubVectorize() so their docs apply. https://developers.cloudflare.com/vectorize/reference/client-api/
Remaining tasks
- [ ] Docs
- [ ] Tests
- [ ] Use remote bindings (on Nuxt Hub API, not API from deployed) for local dev (PR & backend)
- [ ] Remove code for passing accountId to wrangler and adding development vectorize binding
- [ ] Add Vectorize bindings on deploy (backend)
- [x] Wait for #173 to get merged
Local development
- Wrangler doesn't currently support Vectorize with local bindings, so
--remoteis necessary to use Vectorize via the deployed application. See https://github.com/cloudflare/workers-sdk/issues/4360 - An index can be made via Wrangler or via the API.
- Currently, you can only delete all contents in an index by recreating the index itself and redeploying (compared to getall and mass deleting).
- During local development, remote bindings like in #173 could be used.
- Specifying index details (dimensions, metric) could be done in
nuxt.config.tslike
However, DX might be confusing as changing the config maybe shouldn't result in automatically recreating production index to prevent accidental data loss. A complex solution is renaming the old binding but keeping the index if a new index takes its place.vectorize: { metric: 'cosine', dimensions: 768, } - For local development, there needs to be a way to easily recreate the remote database (either to wipe data or to change dimensions/metric.
- Maybe via Nuxt devtools or a cli command?
- Vectorize supports namespaces in indexes (basically like tables) but all models need to be the same dimensions/metric
- Any application using text embedding models of different models (small, base, large) simultaneously would need separate bindings due to different dimension sizes
- Any application using text embedding models of different models (small, base, large) simultaneously would need separate bindings due to different dimension sizes
Turns out Vectorize doesn't support local development, only with wrangler with --remote. This is unlike Workers AI, which supports local development, however models are actually ran on Cloudflare with your account.
Issue tracking Vectorize local bindings: https://github.com/cloudflare/workers-sdk/issues/4360 https://developers.cloudflare.com/workers/testing/local-development/#supported-resource-bindings-in-different-environments
For now, this feature could only be supported with --remote (either via NuxtHub's proxy or wrangler remote). This would involve adding Vectorize to endpoints to NuxtHub's backend and I don't think that's OSS. Alternatively it could be blocked until local development is supported with Vectorize. Alternatively, t
Thanks for looking at it so quickly.
I think this could anyway be possible within the OSS as you would need to deploy your application at first in order to use Vectorize.
Would you be happy to work on the proxy API routes?
I didn't realise those routes were for anything more than just devtools preview with --remote for some reason lol 😄 I've added the proxy routes, ~~but I don't think I can test them yet. If I understand correctly, bindings are added once the build hook is ran. Could you support adding the Vectorize bindings?~~ Edit: With a fresh mind I realised I can manually add the bindings from CF dash myself 🤦
I'll continue my dissertation where I'll be testing test both AI and Vectorize integrations to build a simple vector search engine.
Vectorize and AI works ✨
Got some small things to clean up code wise, which I'll get sorted hopefully by mid ~~July~~ August.
45 minutes of rebasing 😓 git is not my passion
How did you add the vectorize bindinig in the CF dashboard? Mine only shows "Get Started" after creating it with wrangler vectorize create default --dimensions=768 --metric=cosine. Deployments do not seem to create the vector, nor the binding and running dev with --remote shows WARN Remote storage vectorize is enabled locally but it's not enabled in the remote project. Deploy a new version with vectorize enabled to use it remotely.
Hi @posva, I've updated the PR description with some more steps. Basically you'll need to create the index via the wrangler CLI then assign it via the Workers dash. Let me know if you need any help here or on Discord @RihanArfan :)
Thanks a lot for the detailed instructions! I was already doing that and I still have the same issue so I asked around Cloudflare to find out 😅
@posva That error comes from NuxtHub when the local server checks the manifest of the deployed version and sees the feature isn't enabled.
- Local Nuxt with --remote queries Deployed/Remote Nuxt's
/api/_hub/manifest.get.tsto see what features are enabled.- The manifest determines whether to Vectorize is enabled by running .describe() on the index via binding
VECTORIZE.
- The manifest determines whether to Vectorize is enabled by running .describe() on the index via binding
- Local Nuxt has Vectorize enabled but Deployed/Remote Nuxt says it's not enabled would cause that error
You can check whether Vectorize is enabled on remote from local Nuxt Devtools -> Runtime Configs -> Private Runtime Config (bottom) -> hub.remoteManifest.features.storage.vectorize === true
If it's not, make sure you've assigned a VECTORIZE binding from the Cloudflare Dash to the production environment (or preview if you've configured NuxtHub to use that with --remote).
Thanks again for trying to help with so much information!
The issue I have is that I can't change it in the dashboard:
Maybe a bug on cloudflare?
@posva Hmm that is certainly very strange. This is how it should look like:
Have you got a Workers paid subscription ($5/m) and also have created a Vectorize index via the Wrangler CLI?
Yes