openai-cookbook
openai-cookbook copied to clipboard
Running Embeddings encoding locally?
Is there any plan to enable the deployment of a model locally to compute embeddings on tokenized text?
I'm currently using "text-embedding-ada-002" via the API and it's fine, but I'm trying to parse indexes with >1M items and building such an index using web requests is a pain on many levels, and I'd love to find a better-performing way to do this in the future.
No plans for a local model. (That would be more complicated to get working than an API call.)
What are the pain points you'd like to see fixed?
The API call works fine, I'm just tring to get embeddings for a pandas dataframe with ~500K entries and even with paralellization it's going to take about 2 days, assuming no connection trouble (which is a mess to unwind). It would be kind of cool to have a docker image that could be used to calculate embeddings locally, even if it requires a bunch of GPUs... some of us could use it :)
At the moment I'm testing with subsets of 1000 units, and I regularly hit one of these:
This throws a wrench in the whole process...
I'm sorry, that's annoying. How frequently are you hitting it? Can you do exponential backoff and resume? Will escalate to eng team.
It doesn't appear to be correlated with the rate at which I'm making requests, rather the general level of saturation of the OpenAI API. I can chunk the dataframe in 10 row increments sequentially and it will still hit a snag, at most I've got it about 40k rows before it craps out. The error is kind of annoying too because it seems to block automatic retries of the web request, I'll catch it next time and post here.
BTW I'm making a movie recommendation app :)


Here's a script I wrote for mass processing embeddings in case it's helpful: https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py
thanks Ted that looks great! I'll give it a try. I'd still love to do this locally at some point, In would like to be able to have 1000x more data embedded soon 😅
Seems to be working great. It throws this error on every new start but it's gathering the embeddings ok.
Traceback (most recent call last): File "/Users/stephansturges/GPTs/FlixGPT/api_request_parallel_processor.py", line 302, in call_API append_to_jsonl([self.request_json, self.result], save_filepath) File "/Users/stephansturges/GPTs/FlixGPT/api_request_parallel_processor.py", line 322, in append_to_jsonl json_string = json.dumps(data) ^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/[email protected]/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/[email protected]/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 200, in encode chunks = self.iterencode(o, _one_shot=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/[email protected]/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 258, in iterencode return _iterencode(o, 0) ^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/[email protected]/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 180, in default raise TypeError(f'Object of type {o.__class__.__name__} ' TypeError: Object of type ClientConnectorError is not JSON serializable
Thanks @ted-at-openai this actually worked great, I let it run overnight and got everything I need (>9Gb of embeddings 🤣 ). FYI there is an issue with UTF-8 coded characters messing up the tokenizer at some point. I'm going to file a PR for a quick and dirty fix but it could do with some deeper investigating.
BTW I'd love to talk to someone at OpenAI about deploying the movie recommendation app I've made with this 😉 Any chance you can put me in contact with someone?
closed, will append PR
FYI here is the app @ted-at-openai
https://gptflix.streamlit.app/
BTW I'd love to talk to someone at OpenAI about deploying the movie recommendation app I've made with this 😉 Any chance you can put me in contact with someone?
We are pretty swamped these days, unfortunately. What's the ask?
Sorry nothing, I deployed it myself in the end to play with pinecone and streamlit! I was considering that it would make a cool demo that OpenAI could publish to show how to build a massive DB on pinecone (this one fills up an S1 pod) and do context injection on a large scale, but I'm sure you're up to your ears in fun demos and don't have the time for more! I'll make a loom video / tutorial to explain it all in the next few weeks if I can between travel for completely unrelated work... there are some aspects of getting hundreds of thousands of embeddings and being able to use them in a vector DB that are not super intuitive yet 😄 Your parallel embeddings retrieval script was super helpful however!
You can play with the demo at -> www.gptflix.ai