huggingface_hub icon indicating copy to clipboard operation
huggingface_hub copied to clipboard

429 error in InferenceClient

Open sooryansatheesh opened this issue 10 months ago • 2 comments

System Info

@SunMarc

429 Client Error: Too Many Requests for url: https://api-inference.huggingface.co/models

Who can help?

I got the above error when I was trying to get tabular classification predictions from my own model

I used the code below `from huggingface_hub import InferenceClient input_data = [2, 3, 4, 2, 4] df = pd.DataFrame([input_data], columns=cols_used)

client = InferenceClient()

table = df.to_dict(orient="records")

print(table) client.tabular_classification(table=table, model=model_id)`

Can someone help me?

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

I got the above error when I was trying to get tabular classification predictions from my own model

I used the code below `from huggingface_hub import InferenceClient input_data = [2, 3, 4, 2, 4] df = pd.DataFrame([input_data], columns=cols_used)

client = InferenceClient()

table = df.to_dict(orient="records")

print(table) client.tabular_classification(table=table, model=model_id)`

Can someone help me?

Expected behavior

Prediction in the form of a single number from the model

sooryansatheesh avatar Mar 29 '24 06:03 sooryansatheesh

cc @Wauplin

ArthurZucker avatar Mar 30 '24 17:03 ArthurZucker

@sooryansatheesh In your reproducible script above

from huggingface_hub import InferenceClient
input_data = [2, 3, 4, 2, 4]
df = pd.DataFrame([input_data], columns=cols_used)

client = InferenceClient()

table = df.to_dict(orient="records")

print(table)
client.tabular_classification(table=table, model=model_id)

would you mind sharing what values you used for cols_used and model_id? Without it, it's hard to reproduce.

In general HTTP 429 means you got rate limited. Using an hf token should lift the rate limit up which might solve your situation. Another possibility is that your model doesn't load on our inference API servers but to investigate that we would need the model id.

Wauplin avatar Apr 02 '24 08:04 Wauplin

@Wauplin thanks for your work. Unfortunately, this week I encountered a similar issue multiple times.

While authenticated, I tried to use google/gemma-1.1-2b-it (I have access) some days ago. The model was probably loading, so I interrupted the request after several minutes and then I got 429 and was blocked for an hour.

The same happened today with Qwen/Qwen2-7B-Instruct-AWQ.

Is it a known issue? Is there a way to raise an error in these cases and avoid hitting the rate limit?

anakin87 avatar Jun 06 '24 20:06 anakin87

Hi @anakin87, thanks for reporting. To improve user experience, I opened https://github.com/huggingface/huggingface_hub/pull/2318 which will add X-wait-for-model as header. This way InferenceClient won't send requests every second until model is loaded.

Wauplin avatar Jun 07 '24 13:06 Wauplin