litellm
litellm copied to clipboard
[Bug]: huggingface embeddings broken
What happened?
code:
litellm.embedding(
model="huggingface/TaylorAI/bge-micro-v2",
input=["batch_text"],
)
Relevant log output
[92mRequest to litellm:[0m
[92mlitellm.embedding(model='huggingface/TaylorAI/bge-micro-v2', input=['batch_text'])[0m
self.optional_params: {}
kwargs[caching]: False; litellm.cache: None
self.optional_params: {}
[92m
POST Request Sent from LiteLLM:
curl -X POST \
https://api-inference.huggingface.co/models/TaylorAI/bge-micro-v2 \
-H 'content-type: application/json' -H 'Authorization: Bearer hf_WQdylWJwESdfqC********************' \
-d '{'inputs': ['batch_text']}'
[0m
RAW RESPONSE:
<Response [400]>
RAW RESPONSE:
["Input should be a valid dictionary or instance of SentenceSimilarityInputsCheck: received `['batch_text']` in `parameters`"]
[1;31mGive Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new[0m
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.
Logging Details: logger_fn - None | callable(logger_fn) - False
Logging Details LiteLLM-Failure Call
self.failure_callback: []
{
"name": "APIError",
"message": "HuggingfaceException - [\"Input should be a valid dictionary or instance of SentenceSimilarityInputsCheck: received `['batch_text']` in `parameters`\"]",
"stack": "---------------------------------------------------------------------------
HuggingfaceError Traceback (most recent call last)
File ~/miniforge3/lib/python3.10/site-packages/litellm/main.py:2737, in embedding(model, input, dimensions, timeout, api_base, api_version, api_key, api_type, caching, user, custom_llm_provider, litellm_call_id, litellm_logging_obj, logger_fn, **kwargs)
2731 api_key = (
2732 api_key
2733 or litellm.huggingface_key
2734 or get_secret(\"HUGGINGFACE_API_KEY\")
2735 or litellm.api_key
2736 )
-> 2737 response = huggingface.embedding(
2738 model=model,
2739 input=input,
2740 encoding=encoding,
2741 api_key=api_key,
2742 api_base=api_base,
2743 logging_obj=logging,
2744 model_response=EmbeddingResponse(),
2745 )
2746 elif custom_llm_provider == \"bedrock\":
File ~/miniforge3/lib/python3.10/site-packages/litellm/llms/huggingface_restapi.py:750, in Huggingface.embedding(self, model, input, api_key, api_base, logging_obj, model_response, encoding)
749 if \"error\" in embeddings:
--> 750 raise HuggingfaceError(status_code=500, message=embeddings[\"error\"])
752 output_data = []
HuggingfaceError: [\"Input should be a valid dictionary or instance of SentenceSimilarityInputsCheck: received `['batch_text']` in `parameters`\"]
During handling of the above exception, another exception occurred:
APIError Traceback (most recent call last)
/var/folders/96/56v2bn1x2gjd39_zw8jp9_s80000gn/T/ipykernel_58326/3538532934.py in ?()
----> 1 litellm.embedding(
2 model=\"huggingface/TaylorAI/bge-micro-v2\",
3 # model=\"embed-english-v3.0\",
4 input=[\"batch_text\"],
~/miniforge3/lib/python3.10/site-packages/litellm/utils.py in ?(*args, **kwargs)
2848 if (
2849 liteDebuggerClient and liteDebuggerClient.dashboard_url != None
2850 ): # make it easy to get to the debugger logs if you've initialized it
2851 e.message += f\"\
Check the log in your dashboard - {liteDebuggerClient.dashboard_url}\"
-> 2852 raise e
~/miniforge3/lib/python3.10/site-packages/litellm/utils.py in ?(*args, **kwargs)
2848 if (
2849 liteDebuggerClient and liteDebuggerClient.dashboard_url != None
2850 ): # make it easy to get to the debugger logs if you've initialized it
2851 e.message += f\"\
Check the log in your dashboard - {liteDebuggerClient.dashboard_url}\"
-> 2852 raise e
~/miniforge3/lib/python3.10/site-packages/litellm/main.py in ?(model, input, dimensions, timeout, api_base, api_version, api_key, api_type, caching, user, custom_llm_provider, litellm_call_id, litellm_logging_obj, logger_fn, **kwargs)
2889 api_key=api_key,
2890 original_response=str(e),
2891 )
2892 ## Map to OpenAI Exception
-> 2893 raise exception_type(
2894 model=model, original_exception=e, custom_llm_provider=custom_llm_provider
2895 )
~/miniforge3/lib/python3.10/site-packages/litellm/utils.py in ?(model, original_exception, custom_llm_provider, completion_kwargs)
8340 ):
8341 threading.Thread(target=get_all_keys, args=(e.llm_provider,)).start()
8342 # don't let an error with mapping interrupt the user from receiving an error from the llm api calls
8343 if exception_mapping_worked:
-> 8344 raise e
8345 else:
8346 raise original_exception
~/miniforge3/lib/python3.10/site-packages/litellm/utils.py in ?(model, original_exception, custom_llm_provider, completion_kwargs)
8340 ):
8341 threading.Thread(target=get_all_keys, args=(e.llm_provider,)).start()
8342 # don't let an error with mapping interrupt the user from receiving an error from the llm api calls
8343 if exception_mapping_worked:
-> 8344 raise e
8345 else:
8346 raise original_exception
APIError: HuggingfaceException - [\"Input should be a valid dictionary or instance of SentenceSimilarityInputsCheck: received `['batch_text']` in `parameters`\"]"
}
Twitter / LinkedIn details
No response
RAW RESPONSE: ["Input should be a valid dictionary or instance of SentenceSimilarityInputsCheck: received
['batch_text']inparameters"]
@dhruv-anand-aintech is this a text-embeddings-inference provider?
No, it should be using the hugging face free inference API. Not sure if that's the same as tei in litellm
Do you know if it has been fixed since so far? with v1.40.3 I still have the issue FYI
thanks for the bump @Extremys
looking into this now - it looks like hf embeddings has different api schemas -
embed
rerank
/similarity
based on the taylor ai hf api example, it looks like it's expecting the /similarity endpoint
it looks like, if we make a GET request to the model - we might be able to tell what format to use based on the pipeline tag
sentence-similarity = /similarity spec
Great! For my part, to be exact, I'm trying to use a TEI instance but it should be pretty similar. We have the swagger there https://huggingface.github.io/text-embeddings-inference/ There is also /v1/embeddings openai compatible endpoint (I tried to use it but without success so far)
what's the error you're seeing? @Extremys
wondering why openai/ doesn't just work https://docs.litellm.ai/docs/providers/openai_compatible#usage---embedding
what's the error you're seeing? @Extremys
wondering why
openai/doesn't just work https://docs.litellm.ai/docs/providers/openai_compatible#usage---embedding
I'm getting this:
15:30:26 - LiteLLM:INFO: utils.py:1310 -
POST Request Sent from LiteLLM:
curl -X POST \
https://my-tei-url.com/v1 \
-d '{'model': 'gte-large-en-v1.5', 'input': ['This is a test document.'], 'user': 'myuser', 'encoding_format': 'base64', 'no_proxy': True}'
LiteLLM Proxy: Inside Proxy Logging Pre-call hook!
Inside Max Parallel Request Pre-Call Hook
get cache: cache key: <litellmkey>::2024-07-30-15-30::request_count; local_only: False
get cache: cache result: None
current: None
async get cache: cache key: myuser; local_only: False
in_memory_result: None
get cache: cache result: None
Inside Max Budget Limiter Pre-Call Hook
get cache: cache key: myuser_user_api_key_user_id; local_only: False
get cache: cache result: None
Inside Cache Control Check Pre-Call Hook
LiteLLM Proxy: final data being sent to embeddings call: {'input': ['This is a test document.'], 'model': 'gte-large-en-v1.5', 'encoding_format': 'base64', 'proxy_server_request': {'url': 'http://mylitellmurl.com/v1/embeddings', 'method': 'POST', 'headers': {'x-forwarded-for': '192.168.248.120', 'x-forwarded-proto': 'https', 'x-forwarded-port': '443', 'host': 'mylitellmurl.com', 'x-amzn-trace-id': 'Root=1-66a90712-72f88f8e2b7c05766c9eb73b', 'content-length': '104', 'accept-encoding': 'gzip, deflate', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.35.10', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.35.10', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.10.8', 'authorization': 'Bearer <litellmkey>', 'x-stainless-async': 'false'}, 'body': {'input': [[2028, 374, 264, 1296, 2246, 13]], 'model': 'gte-large-en-v1.5', 'encoding_format': 'base64'}}, 'user': 'myuser', 'metadata': {'user_api_key': '<litellmkey>', 'user_api_key_metadata': {'app': 'genai_csr'}, 'headers': {'x-forwarded-for': '192.168.248.120', 'x-forwarded-proto': 'https', 'x-forwarded-port': '443', 'host': 'mylitellmurl.com', 'x-amzn-trace-id': 'Root=1-66a90712-72f88f8e2b7c05766c9eb73b', 'content-length': '104', 'accept-encoding': 'gzip, deflate', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.35.10', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.35.10', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.10.8', 'x-stainless-async': 'false'}, 'litellm_api_version': '1.40.3', 'user_api_key_alias': None, 'global_max_parallel_requests': None, 'user_api_key_user_id': 'myuser', 'user_api_key_team_id': None, 'user_api_key_team_alias': None, 'endpoint': 'http://mylitellmurl.com/v1/embeddings'}}
initial list of deployments: [{'model_name': 'gte-large-en-v1.5', 'litellm_params': {'api_key': '<myteikey>', 'api_base': 'https://my-tei-url.com/v1', 'model': 'openai/gte-large-en-v1.5', 'no_proxy': True}, 'model_info': {'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}}]
async get cache: cache key: 15-30:cooldown_models; local_only: False
in_memory_result: None
get cache: cache result: None
get cache: cache key: 77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a_async_client; local_only: True
get cache: cache result: <openai.AsyncOpenAI object at 0x7fa5a01af310>
get cache: cache key: 77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a_max_parallel_requests_client; local_only: True
get cache: cache result: None
Request to litellm:
litellm.aembedding(api_key='<myteikey>', api_base='https://my-tei-url.com/v1', model='openai/gte-large-en-v1.5', no_proxy=True, input=['This is a test document.'], caching=False, client=<openai.AsyncOpenAI object at 0x7fa5a01af310>, encoding_format='base64', proxy_server_request={'url': 'http://mylitellmurl.com/v1/embeddings', 'method': 'POST', 'headers': {'x-forwarded-for': '192.168.248.120', 'x-forwarded-proto': 'https', 'x-forwarded-port': '443', 'host': 'mylitellmurl.com', 'x-amzn-trace-id': 'Root=1-66a90712-72f88f8e2b7c05766c9eb73b', 'content-length': '104', 'accept-encoding': 'gzip, deflate', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.35.10', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.35.10', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.10.8', 'authorization': 'Bearer <litellmkey>', 'x-stainless-async': 'false'}, 'body': {'input': [[2028, 374, 264, 1296, 2246, 13]], 'model': 'gte-large-en-v1.5', 'encoding_format': 'base64'}}, user='myuser', metadata={'user_api_key': '<litellmkey>', 'user_api_key_metadata': {'app': 'genai_csr'}, 'headers': {'x-forwarded-for': '192.168.248.120', 'x-forwarded-proto': 'https', 'x-forwarded-port': '443', 'host': 'mylitellmurl.com', 'x-amzn-trace-id': 'Root=1-66a90712-72f88f8e2b7c05766c9eb73b', 'content-length': '104', 'accept-encoding': 'gzip, deflate', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.35.10', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.35.10', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.10.8', 'x-stainless-async': 'false'}, 'litellm_api_version': '1.40.3', 'user_api_key_alias': None, 'global_max_parallel_requests': None, 'user_api_key_user_id': 'myuser', 'user_api_key_team_id': None, 'user_api_key_team_alias': None, 'endpoint': 'http://mylitellmurl.com/v1/embeddings', 'model_group': 'gte-large-en-v1.5', 'deployment': 'openai/gte-large-en-v1.5', 'model_info': {'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}, 'api_base': 'https://my-tei-url.com/v1', 'caching_groups': None}, model_info={'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}, timeout=None, max_retries=0)
Initialized litellm callbacks, Async Success Callbacks: [<litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7fa5a0a647c0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7fa5a0a64880>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7fa5a0a64850>, <litellm._service_logger.ServiceLogging object at 0x7fa59ffec5e0>]
init callback list: <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7fa5a0a64850>
init callback list: <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7fa5a0a64880>
init callback list: langfuse
init callback list: <bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x7fa5a01f9c30>>
init callback list: <bound method SlackAlerting.response_taking_too_long_callback of <litellm.integrations.slack_alerting.SlackAlerting object at 0x7fa5a0a64310>>
init callback list: <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7fa5a0a647c0>
init callback list: <litellm._service_logger.ServiceLogging object at 0x7fa59ffec5e0>
self.optional_params: {}
ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
self.optional_params: {'user': 'myuser', 'encoding_format': 'base64', 'no_proxy': True}
RAW RESPONSE:
AsyncEmbeddings.create() got an unexpected keyword argument 'no_proxy'
Logging Details: logger_fn - None | callable(logger_fn) - False
Traceback (most recent call last):
File "/code/litellm/litellm/utils.py", line 8738, in exception_type
raise APIConnectionError(
litellm.exceptions.APIConnectionError: APIConnectionError: OpenAIException - AsyncEmbeddings.create() got an unexpected keyword argument 'no_proxy'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/code/litellm/litellm/proxy/proxy_server.py", line 4809, in embeddings
response = await llm_router.aembedding(**data)
File "/code/litellm/litellm/router.py", line 1742, in aembedding
raise e
File "/code/litellm/litellm/router.py", line 1739, in aembedding
response = await self.async_function_with_fallbacks(**kwargs)
File "/code/litellm/litellm/router.py", line 2101, in async_function_with_fallbacks
raise original_exception
File "/code/litellm/litellm/router.py", line 2005, in async_function_with_fallbacks
response = await self.async_function_with_retries(*args, **kwargs)
File "/code/litellm/litellm/router.py", line 2197, in async_function_with_retries
raise original_exception
File "/code/litellm/litellm/router.py", line 2120, in async_function_with_retries
response = await original_function(*args, **kwargs)
File "/code/litellm/litellm/router.py", line 1832, in _aembedding
raise e
File "/code/litellm/litellm/router.py", line 1819, in _aembedding
response = await response
File "/code/litellm/litellm/utils.py", line 3901, in wrapper_async
raise e
File "/code/litellm/litellm/utils.py", line 3729, in wrapper_async
result = await original_function(*args, **kwargs)
File "/code/litellm/litellm/main.py", line 2821, in aembedding
raise exception_type(
File "/code/litellm/litellm/utils.py", line 10000, in exception_type
raise original_exception
File "/code/litellm/litellm/main.py", line 2812, in aembedding
response = await init_response
File "/code/litellm/litellm/llms/openai.py", line 945, in aembedding
raise e
File "/code/litellm/litellm/llms/openai.py", line 928, in aembedding
response = await openai_aclient.embeddings.create(**data, timeout=timeout) # type: ignore
TypeError: AsyncEmbeddings.create() got an unexpected keyword argument 'no_proxy'
initial list of deployments: [{'model_name': 'gte-large-en-v1.5', 'litellm_params': {'api_key': '<myteikey>', 'api_base': 'https://my-tei-url.com/v1', 'model': 'openai/gte-large-en-v1.5', 'no_proxy': True}, 'model_info': {'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}}]
async get cache: cache key: 15-30:cooldown_models; local_only: False
in_memory_result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
get cache: cache result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
initial list of deployments: [{'model_name': 'gte-large-en-v1.5', 'litellm_params': {'api_key': '<myteikey>', 'api_base': 'https://my-tei-url.com/v1', 'model': 'openai/gte-large-en-v1.5', 'no_proxy': True}, 'model_info': {'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}}]
async get cache: cache key: 15-30:cooldown_models; local_only: False
in_memory_result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
get cache: cache result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
initial list of deployments: [{'model_name': 'gte-large-en-v1.5', 'litellm_params': {'api_key': '<myteikey>', 'api_base': 'https://my-tei-url.com/v1', 'model': 'openai/gte-large-en-v1.5', 'no_proxy': True}, 'model_info': {'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}}]
async get cache: cache key: 15-30:cooldown_models; local_only: False
in_memory_result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
get cache: cache result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
initial list of deployments: [{'model_name': 'gte-large-en-v1.5', 'litellm_params': {'api_key': '<myteikey>', 'api_base': 'https://my-tei-url.com/v1', 'model': 'openai/gte-large-en-v1.5', 'no_proxy': True}, 'model_info': {'id': '77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a', 'db_model': False}}]
async get cache: cache key: 15-30:cooldown_models; local_only: False
in_memory_result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
get cache: cache result: ['77ae41d109bed17c862cfa5aa200970f7d9bae111584bab3dc76d6b58218953a']
INFO: 192.9.18.5:3132 - "POST /v1/embeddings HTTP/1.1" 500 Internal Server Error
15:30:29 - LiteLLM Router:INFO: router.py:1827 - litellm.aembedding(model=None) Exception No deployments available for selected model, Try again in 60 seconds. Passed model=gte-large-en-v1.5. pre-call-checks=False, allowed_model_region=n/a
15:30:30 - LiteLLM Router:INFO: router.py:1827 - litellm.aembedding(model=None) Exception No deployments available for selected model, Try again in 60 seconds. Passed model=gte-large-en-v1.5. pre-call-checks=False, allowed_model_region=n/a
15:30:30 - LiteLLM Router:INFO: router.py:1827 - litellm.aembedding(model=None) Exception No deployments available for selected model, Try again in 60 seconds. Passed model=gte-large-en-v1.5. pre-call-checks=False, allowed_model_region=n/a
my config file:
model_list:
- model_name: gte-large-en-v1.5
litellm_params:
model: openai/gte-large-en-v1.5
api_base: "os.environ/TEI_URL"
api_key: "os.environ/TEI_API_KEY"
no_proxy: true
environment_variables:
TEI_URL="https://my-tei-url.com/v1"
TEI_API_KEY=<myteikey>
litellm_settings:
set_verbose: True
drop_params: True
success_callback: ["langfuse"]
request_timeout: 300
the testing code:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
openai_api_base="https://my-litellm-url.com/v1",
model="gte-large-en-v1.5",
openai_api_key='<mykey>'
)
text = "This is a test document."
query_result = embeddings.embed_query(text)
doc_result = embeddings.embed_documents([text])
print(query_result)
print(doc_result)
@Extremys can you bump and re-share. We now show the received error from provider in the 'no deployments available' message
@Extremys can you bump and re-share. We now show the received error from provider in the 'no deployments available' message
It should be better now :)
RAW RESPONSE: AsyncEmbeddings.create() got an unexpected keyword argument 'no_proxy'
@Extremys it looks like the no_proxy keyword is causing errors. Is that intended?
RAW RESPONSE: AsyncEmbeddings.create() got an unexpected keyword argument 'no_proxy'
@Extremys it looks like the no_proxy keyword is causing errors. Is that intended?
Not really, it was for a older version, but it's not necessary anymore, I don't know I didn't catch it before, thank you :)