llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

[Feature Request]: Change OpenAI default embedding model from "text-embedding-ada-002" to "text-embedding-3-small"

Open dsanr opened this issue 2 years ago • 5 comments

Feature Description

text-embedding-3-small model is better and less costly than the text-embedding-ada-002 model. So, it is beneficial to make the former model default. https://openai.com/blog/new-embedding-models-and-api-updates https://openai.com/pricing

Reason

No response

Value of Feature

No response

dsanr avatar Apr 21 '24 10:04 dsanr

@dsanr this would be a giant breaking change. Probably this should be

a) saved for a larger version bump b) properly communicated ahead of time to users

logan-markewich avatar Apr 21 '24 17:04 logan-markewich

@logan-markewich They both have the same dimensionality of 1536. Are there any other reasons why this would be a giant breaking change?

dsanr avatar Apr 22 '24 17:04 dsanr

I tried to replace adda with 3-small and found that they are not compatible even if the dimensionality is the same. ie, all users created indexes using default ada will find their queries behave quite differently using default 3-small.

justinzyw avatar Apr 24 '24 23:04 justinzyw

@justinzyw is correct. It's not the dimension that matters so much, they are trained on completely different data. Vectors created with Ada are in a completely different vector space compared to small-3

logan-markewich avatar Apr 25 '24 04:04 logan-markewich

@justinzyw Thanks for trying it out. @logan-markewich Yeah, in this case, we can only take up this in any next major release.

dsanr avatar Apr 25 '24 19:04 dsanr