open-text-embeddings Run On Multi-GPU

Is it possible? my CUDA reported out of memory.

Mar 17 '24 02:03 mayzyo

You can set the device_map when load the embedding model with transformer.

Apr 13 '24 13:04 linkedlist771

@linkedlist771 Thanks for suggestion. Appreciate if you could send me PR for the implementation.

Jul 13 '24 06:07 limcheekin

@limcheekin Thank you for your response. I'd be happy to attempt submitting a PR to address this issue. I'll start working on this as soon as possible and submit a PR for your review when it's ready. If you have any specific requirements or suggestions for the implementation, please let me know. I'll strive to ensure the PR adheres to the project's coding standards and best practices.

If I encounter any issues or need clarification during the implementation process, I'll update the progress in this issue. Thank you again for the opportunity to contribute to the project.

Jul 13 '24 08:07 linkedlist771

once i launch the server , how can i use it in the same way as below from openai import OpenAI from openai import AsyncOpenAI client = AsyncOpenAI(api_key="fake-api-key",base_url="http://localhost:8000") embeddings = client.embeddings.create( input=["input"], )

Jul 19 '24 07:07 riyajatar37003

once i launch the server , how can i use it in the same way as below from openai import OpenAI from openai import AsyncOpenAI client = AsyncOpenAI(api_key="fake-api-key",base_url="http://localhost:8000") embeddings = client.embeddings.create( input=["input"], )

Thanks for your interest. I don't have experience on AsyncOpenAI class. I think it is not supported now, it is good candidate for future enhancement. Please help to open an issue.

Jul 20 '24 04:07 limcheekin

open-text-embeddings open-text-embeddings copied to clipboard

Run On Multi-GPU

open-text-embeddings
open-text-embeddings copied to clipboard