litellm [Bug]: huggingface embeddings broken

What happened?

code:

litellm.embedding(
    model="huggingface/TaylorAI/bge-micro-v2",
    input=["batch_text"],
)

Relevant log output

[92mRequest to litellm:[0m
[92mlitellm.embedding(model='huggingface/TaylorAI/bge-micro-v2', input=['batch_text'])[0m


self.optional_params: {}
kwargs[caching]: False; litellm.cache: None
self.optional_params: {}
[92m

POST Request Sent from LiteLLM:
curl -X POST \
https://api-inference.huggingface.co/models/TaylorAI/bge-micro-v2 \
-H 'content-type: application/json' -H 'Authorization: Bearer hf_WQdylWJwESdfqC********************' \
-d '{'inputs': ['batch_text']}'
[0m

RAW RESPONSE:
<Response [400]>


RAW RESPONSE:
["Input should be a valid dictionary or instance of SentenceSimilarityInputsCheck: received `['batch_text']` in `parameters`"]



[1;31mGive Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new[0m
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Logging Details: logger_fn - None | callable(logger_fn) - False
Logging Details LiteLLM-Failure Call
self.failure_callback: []

{
	"name": "APIError",
	"message": "HuggingfaceException - [\"Input should be a valid dictionary or instance of SentenceSimilarityInputsCheck: received `['batch_text']` in `parameters`\"]",
	"stack": "---------------------------------------------------------------------------
HuggingfaceError                          Traceback (most recent call last)
File ~/miniforge3/lib/python3.10/site-packages/litellm/main.py:2737, in embedding(model, input, dimensions, timeout, api_base, api_version, api_key, api_type, caching, user, custom_llm_provider, litellm_call_id, litellm_logging_obj, logger_fn, **kwargs)
   2731     api_key = (
   2732         api_key
   2733         or litellm.huggingface_key
   2734         or get_secret(\"HUGGINGFACE_API_KEY\")
   2735         or litellm.api_key
   2736     )
-> 2737     response = huggingface.embedding(
   2738         model=model,
   2739         input=input,
   2740         encoding=encoding,
   2741         api_key=api_key,
   2742         api_base=api_base,
   2743         logging_obj=logging,
   2744         model_response=EmbeddingResponse(),
   2745     )
   2746 elif custom_llm_provider == \"bedrock\":

File ~/miniforge3/lib/python3.10/site-packages/litellm/llms/huggingface_restapi.py:750, in Huggingface.embedding(self, model, input, api_key, api_base, logging_obj, model_response, encoding)
    749 if \"error\" in embeddings:
--> 750     raise HuggingfaceError(status_code=500, message=embeddings[\"error\"])
    752 output_data = []

HuggingfaceError: [\"Input should be a valid dictionary or instance of SentenceSimilarityInputsCheck: received `['batch_text']` in `parameters`\"]

During handling of the above exception, another exception occurred:

APIError                                  Traceback (most recent call last)
/var/folders/96/56v2bn1x2gjd39_zw8jp9_s80000gn/T/ipykernel_58326/3538532934.py in ?()
----> 1 litellm.embedding(
      2     model=\"huggingface/TaylorAI/bge-micro-v2\",
      3     # model=\"embed-english-v3.0\",
      4     input=[\"batch_text\"],

~/miniforge3/lib/python3.10/site-packages/litellm/utils.py in ?(*args, **kwargs)
   2848                     if (
   2849                         liteDebuggerClient and liteDebuggerClient.dashboard_url != None
   2850                     ):  # make it easy to get to the debugger logs if you've initialized it
   2851                         e.message += f\"\
 Check the log in your dashboard - {liteDebuggerClient.dashboard_url}\"
-> 2852             raise e

~/miniforge3/lib/python3.10/site-packages/litellm/utils.py in ?(*args, **kwargs)
   2848                     if (
   2849                         liteDebuggerClient and liteDebuggerClient.dashboard_url != None
   2850                     ):  # make it easy to get to the debugger logs if you've initialized it
   2851                         e.message += f\"\
 Check the log in your dashboard - {liteDebuggerClient.dashboard_url}\"
-> 2852             raise e

~/miniforge3/lib/python3.10/site-packages/litellm/main.py in ?(model, input, dimensions, timeout, api_base, api_version, api_key, api_type, caching, user, custom_llm_provider, litellm_call_id, litellm_logging_obj, logger_fn, **kwargs)
   2889             api_key=api_key,
   2890             original_response=str(e),
   2891         )
   2892         ## Map to OpenAI Exception
-> 2893         raise exception_type(
   2894             model=model, original_exception=e, custom_llm_provider=custom_llm_provider
   2895         )

~/miniforge3/lib/python3.10/site-packages/litellm/utils.py in ?(model, original_exception, custom_llm_provider, completion_kwargs)
   8340         ):
   8341             threading.Thread(target=get_all_keys, args=(e.llm_provider,)).start()
   8342         # don't let an error with mapping interrupt the user from receiving an error from the llm api calls
   8343         if exception_mapping_worked:
-> 8344             raise e
   8345         else:
   8346             raise original_exception

~/miniforge3/lib/python3.10/site-packages/litellm/utils.py in ?(model, original_exception, custom_llm_provider, completion_kwargs)
   8340         ):
   8341             threading.Thread(target=get_all_keys, args=(e.llm_provider,)).start()
   8342         # don't let an error with mapping interrupt the user from receiving an error from the llm api calls
   8343         if exception_mapping_worked:
-> 8344             raise e
   8345         else:
   8346             raise original_exception

APIError: HuggingfaceException - [\"Input should be a valid dictionary or instance of SentenceSimilarityInputsCheck: received `['batch_text']` in `parameters`\"]"
}

Twitter / LinkedIn details

No response

Apr 24 '24 09:04 dhruv-anand-aintech

RAW RESPONSE: ["Input should be a valid dictionary or instance of SentenceSimilarityInputsCheck: received ['batch_text'] in parameters"]

@dhruv-anand-aintech is this a text-embeddings-inference provider?

Apr 26 '24 19:04 krrishdholakia

No, it should be using the hugging face free inference API. Not sure if that's the same as tei in litellm

Apr 26 '24 19:04 dhruv-anand-aintech

Do you know if it has been fixed since so far? with v1.40.3 I still have the issue FYI

Jul 30 '24 14:07 Extremys

thanks for the bump @Extremys

Jul 30 '24 14:07 krrishdholakia

looking into this now - it looks like hf embeddings has different api schemas -

embed Screenshot 2024-07-30 at 7 55 56 AM

rerank Screenshot 2024-07-30 at 7 59 26 AM

/similarity Screenshot 2024-07-30 at 7 59 48 AM

Jul 30 '24 14:07 krrishdholakia

based on the taylor ai hf api example, it looks like it's expecting the /similarity endpoint

Jul 30 '24 15:07 krrishdholakia

it looks like, if we make a GET request to the model - we might be able to tell what format to use based on the pipeline tag

sentence-similarity = /similarity spec Screenshot 2024-07-30 at 8 03 21 AM

Jul 30 '24 15:07 krrishdholakia

Great! For my part, to be exact, I'm trying to use a TEI instance but it should be pretty similar. We have the swagger there https://huggingface.github.io/text-embeddings-inference/ There is also /v1/embeddings openai compatible endpoint (I tried to use it but without success so far)

Jul 30 '24 15:07 Extremys

what's the error you're seeing? @Extremys

wondering why openai/ doesn't just work https://docs.litellm.ai/docs/providers/openai_compatible#usage---embedding

Jul 30 '24 15:07 krrishdholakia

what's the error you're seeing? @Extremys

wondering why openai/ doesn't just work https://docs.litellm.ai/docs/providers/openai_compatible#usage---embedding

I'm getting this:


15:30:26 - LiteLLM:INFO: utils.py:1310 -

POST Request Sent from LiteLLM:
curl -X POST \
https://my-tei-url.com/v1 \
-d '{'model': 'gte-large-en-v1.5', 'input': ['This is a test document.'], 'user': 'myuser', 'encoding_format': 'base64', 'no_proxy': True}'


LiteLLM Proxy: Inside Proxy Logging Pre-call hook!
Inside Max Parallel Request Pre-Call Hook
get cache: cache key: <litellmkey>::2024-07-30-15-30::request_count; local_only: False
get cache: cache result: None
current: None
async get cache: cache key: myuser; local_only: False
in_memory_result: None
get cache: cache result: None
Inside Max Budget Limiter Pre-Call Hook
get cache: cache key: myuser_user_api_key_user_id; local_only: False
get cache: cache result: None
Inside Cache Control Check Pre-Call Hook
LiteLLM Proxy: final data being sent to embeddings call: {'input': ['This is a test document.'], 'model': 'gte-large-en-v1.5', 'encoding_format': 'base64', 'proxy_server_request': {'url': 'http://mylitellmurl.com/v1/embeddings', 'method': 'POST', 'headers': {'x-forwarded-for': '192.168.248.120', 'x-forwarded-proto': 'https', 'x-forwarded-port': '443', 'host': 'mylitellmurl.com', 'x-amzn-trace-id': 'Root=1-66a90712-72f88f8e2b7c05766c9eb73b', 'content-length': '104', 'accept-encoding': 'gzip, deflate', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.35.10', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.35.10', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.10.8', 'authorization': 'Bearer <litellmkey>', 'x-stainless-async': 'false'}, 'body': {'input': [[2028, 374, 264, 1296, 2246, 13]], 'model': 'gte-large-en-v1.5', 'encoding_format': 'base64'}}, 'user': 'myuser', 'metadata': {'user_api_key': '<litellmkey>', 'user_api_key_metadata': {'app': 'genai_csr'}, 'headers': {'x-forwarded-for': '192.168.248.120', 'x-forwarded-proto': 'https', 'x-forwarded-port': '443', 'host': 'mylitellmurl.com', 'x-amzn-trace-id': 'Root=1-66a90712-72f88f8e2b7c05766c9eb73b', 'content-length': '104', 'accept-encoding': 'gzip, deflate', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.35.10', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.35.10', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.10.8', 'x-stainless-async': 'false'}, 'litellm_api_version': '1.40.3', 'user_api_key_alias': None, 'global_max_parallel_requests': None, 'user_api_key_user_id': 'myuser', 'user_api_key_team_id': None, 'user_api_key_team_alias': None, 'endpoint': 'http://mylitellmurl.com/v1/embeddings'}}
initial list of deployments: [{'model_name': 'gte-large-en-v1.5', 'litellm_params': {'api_key': '<myteikey>', 'api_base': 'https://my-tei-url.com/v1', 'model': 'openai/gte-large-en-v1.5', 'no_proxy': True}, 'model_info': {'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}}]
async get cache: cache key: 15-30:cooldown_models; local_only: False
in_memory_result: None
get cache: cache result: None
get cache: cache key: 77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a_async_client; local_only: True
get cache: cache result: <openai.AsyncOpenAI object at 0x7fa5a01af310>
get cache: cache key: 77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a_max_parallel_requests_client; local_only: True
get cache: cache result: None


Request to litellm:
litellm.aembedding(api_key='<myteikey>', api_base='https://my-tei-url.com/v1', model='openai/gte-large-en-v1.5', no_proxy=True, input=['This is a test document.'], caching=False, client=<openai.AsyncOpenAI object at 0x7fa5a01af310>, encoding_format='base64', proxy_server_request={'url': 'http://mylitellmurl.com/v1/embeddings', 'method': 'POST', 'headers': {'x-forwarded-for': '192.168.248.120', 'x-forwarded-proto': 'https', 'x-forwarded-port': '443', 'host': 'mylitellmurl.com', 'x-amzn-trace-id': 'Root=1-66a90712-72f88f8e2b7c05766c9eb73b', 'content-length': '104', 'accept-encoding': 'gzip, deflate', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.35.10', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.35.10', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.10.8', 'authorization': 'Bearer <litellmkey>', 'x-stainless-async': 'false'}, 'body': {'input': [[2028, 374, 264, 1296, 2246, 13]], 'model': 'gte-large-en-v1.5', 'encoding_format': 'base64'}}, user='myuser', metadata={'user_api_key': '<litellmkey>', 'user_api_key_metadata': {'app': 'genai_csr'}, 'headers': {'x-forwarded-for': '192.168.248.120', 'x-forwarded-proto': 'https', 'x-forwarded-port': '443', 'host': 'mylitellmurl.com', 'x-amzn-trace-id': 'Root=1-66a90712-72f88f8e2b7c05766c9eb73b', 'content-length': '104', 'accept-encoding': 'gzip, deflate', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.35.10', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.35.10', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.10.8', 'x-stainless-async': 'false'}, 'litellm_api_version': '1.40.3', 'user_api_key_alias': None, 'global_max_parallel_requests': None, 'user_api_key_user_id': 'myuser', 'user_api_key_team_id': None, 'user_api_key_team_alias': None, 'endpoint': 'http://mylitellmurl.com/v1/embeddings', 'model_group': 'gte-large-en-v1.5', 'deployment': 'openai/gte-large-en-v1.5', 'model_info': {'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}, 'api_base': 'https://my-tei-url.com/v1', 'caching_groups': None}, model_info={'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}, timeout=None, max_retries=0)


Initialized litellm callbacks, Async Success Callbacks: [<litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7fa5a0a647c0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7fa5a0a64880>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7fa5a0a64850>, <litellm._service_logger.ServiceLogging object at 0x7fa59ffec5e0>]
init callback list: <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7fa5a0a64850>
init callback list: <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7fa5a0a64880>
init callback list: langfuse
init callback list: <bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x7fa5a01f9c30>>
init callback list: <bound method SlackAlerting.response_taking_too_long_callback of <litellm.integrations.slack_alerting.SlackAlerting object at 0x7fa5a0a64310>>
init callback list: <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7fa5a0a647c0>
init callback list: <litellm._service_logger.ServiceLogging object at 0x7fa59ffec5e0>
self.optional_params: {}
ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
self.optional_params: {'user': 'myuser', 'encoding_format': 'base64', 'no_proxy': True}
RAW RESPONSE:
AsyncEmbeddings.create() got an unexpected keyword argument 'no_proxy'


Logging Details: logger_fn - None | callable(logger_fn) - False


Traceback (most recent call last):
  File "/code/litellm/litellm/utils.py", line 8738, in exception_type
    raise APIConnectionError(
litellm.exceptions.APIConnectionError: APIConnectionError: OpenAIException - AsyncEmbeddings.create() got an unexpected keyword argument 'no_proxy'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/code/litellm/litellm/proxy/proxy_server.py", line 4809, in embeddings
    response = await llm_router.aembedding(**data)
  File "/code/litellm/litellm/router.py", line 1742, in aembedding
    raise e
  File "/code/litellm/litellm/router.py", line 1739, in aembedding
    response = await self.async_function_with_fallbacks(**kwargs)
  File "/code/litellm/litellm/router.py", line 2101, in async_function_with_fallbacks
    raise original_exception
  File "/code/litellm/litellm/router.py", line 2005, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
  File "/code/litellm/litellm/router.py", line 2197, in async_function_with_retries
    raise original_exception
  File "/code/litellm/litellm/router.py", line 2120, in async_function_with_retries
    response = await original_function(*args, **kwargs)
  File "/code/litellm/litellm/router.py", line 1832, in _aembedding
    raise e
  File "/code/litellm/litellm/router.py", line 1819, in _aembedding
    response = await response
  File "/code/litellm/litellm/utils.py", line 3901, in wrapper_async
    raise e
  File "/code/litellm/litellm/utils.py", line 3729, in wrapper_async
    result = await original_function(*args, **kwargs)
  File "/code/litellm/litellm/main.py", line 2821, in aembedding
    raise exception_type(
  File "/code/litellm/litellm/utils.py", line 10000, in exception_type
    raise original_exception
  File "/code/litellm/litellm/main.py", line 2812, in aembedding
    response = await init_response
  File "/code/litellm/litellm/llms/openai.py", line 945, in aembedding
    raise e
  File "/code/litellm/litellm/llms/openai.py", line 928, in aembedding
    response = await openai_aclient.embeddings.create(**data, timeout=timeout)  # type: ignore
TypeError: AsyncEmbeddings.create() got an unexpected keyword argument 'no_proxy'
initial list of deployments: [{'model_name': 'gte-large-en-v1.5', 'litellm_params': {'api_key': '<myteikey>', 'api_base': 'https://my-tei-url.com/v1', 'model': 'openai/gte-large-en-v1.5', 'no_proxy': True}, 'model_info': {'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}}]
async get cache: cache key: 15-30:cooldown_models; local_only: False
in_memory_result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
get cache: cache result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
initial list of deployments: [{'model_name': 'gte-large-en-v1.5', 'litellm_params': {'api_key': '<myteikey>', 'api_base': 'https://my-tei-url.com/v1', 'model': 'openai/gte-large-en-v1.5', 'no_proxy': True}, 'model_info': {'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}}]
async get cache: cache key: 15-30:cooldown_models; local_only: False
in_memory_result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
get cache: cache result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
initial list of deployments: [{'model_name': 'gte-large-en-v1.5', 'litellm_params': {'api_key': '<myteikey>', 'api_base': 'https://my-tei-url.com/v1', 'model': 'openai/gte-large-en-v1.5', 'no_proxy': True}, 'model_info': {'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}}]
async get cache: cache key: 15-30:cooldown_models; local_only: False
in_memory_result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
get cache: cache result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
initial list of deployments: [{'model_name': 'gte-large-en-v1.5', 'litellm_params': {'api_key': '<myteikey>', 'api_base': 'https://my-tei-url.com/v1', 'model': 'openai/gte-large-en-v1.5', 'no_proxy': True}, 'model_info': {'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}}]
async get cache: cache key: 15-30:cooldown_models; local_only: False
in_memory_result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
get cache: cache result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
INFO:     192.9.18.5:3132 - "POST /v1/embeddings HTTP/1.1" 500 Internal Server Error
15:30:29 - LiteLLM Router:INFO: router.py:1827 - litellm.aembedding(model=None) Exception No deployments available for selected model, Try again in 60 seconds. Passed model=gte-large-en-v1.5. pre-call-checks=False, allowed_model_region=n/a
15:30:30 - LiteLLM Router:INFO: router.py:1827 - litellm.aembedding(model=None) Exception No deployments available for selected model, Try again in 60 seconds. Passed model=gte-large-en-v1.5. pre-call-checks=False, allowed_model_region=n/a
15:30:30 - LiteLLM Router:INFO: router.py:1827 - litellm.aembedding(model=None) Exception No deployments available for selected model, Try again in 60 seconds. Passed model=gte-large-en-v1.5. pre-call-checks=False, allowed_model_region=n/a

my config file:

model_list:
  - model_name: gte-large-en-v1.5
    litellm_params:
      model: openai/gte-large-en-v1.5
      api_base: "os.environ/TEI_URL"
      api_key: "os.environ/TEI_API_KEY"
      no_proxy: true
environment_variables:
  TEI_URL="https://my-tei-url.com/v1"
  TEI_API_KEY=<myteikey>
litellm_settings:
  set_verbose: True
  drop_params: True
  success_callback: ["langfuse"]
  request_timeout: 300

the testing code:

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    openai_api_base="https://my-litellm-url.com/v1",
    model="gte-large-en-v1.5",
    openai_api_key='<mykey>'
)

text = "This is a test document."

query_result = embeddings.embed_query(text)

doc_result = embeddings.embed_documents([text])
print(query_result)
print(doc_result)

Jul 30 '24 15:07 Extremys

@Extremys can you bump and re-share. We now show the received error from provider in the 'no deployments available' message

Jul 30 '24 15:07 krrishdholakia

@Extremys can you bump and re-share. We now show the received error from provider in the 'no deployments available' message

It should be better now :)

Jul 30 '24 15:07 Extremys

RAW RESPONSE: AsyncEmbeddings.create() got an unexpected keyword argument 'no_proxy'

@Extremys it looks like the no_proxy keyword is causing errors. Is that intended?

Jul 30 '24 18:07 krrishdholakia

RAW RESPONSE: AsyncEmbeddings.create() got an unexpected keyword argument 'no_proxy'

@Extremys it looks like the no_proxy keyword is causing errors. Is that intended?

Not really, it was for a older version, but it's not necessary anymore, I don't know I didn't catch it before, thank you :)

Jul 31 '24 06:07 Extremys

litellm litellm copied to clipboard

[Bug]: huggingface embeddings broken

What happened?

Relevant log output

Twitter / LinkedIn details

litellm
litellm copied to clipboard