huggingface_hub
huggingface_hub copied to clipboard
429 error in InferenceClient
System Info
@SunMarc
429 Client Error: Too Many Requests for url: https://api-inference.huggingface.co/models
Who can help?
I got the above error when I was trying to get tabular classification predictions from my own model
I used the code below `from huggingface_hub import InferenceClient input_data = [2, 3, 4, 2, 4] df = pd.DataFrame([input_data], columns=cols_used)
client = InferenceClient()
table = df.to_dict(orient="records")
print(table) client.tabular_classification(table=table, model=model_id)`
Can someone help me?
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
I got the above error when I was trying to get tabular classification predictions from my own model
I used the code below `from huggingface_hub import InferenceClient input_data = [2, 3, 4, 2, 4] df = pd.DataFrame([input_data], columns=cols_used)
client = InferenceClient()
table = df.to_dict(orient="records")
print(table) client.tabular_classification(table=table, model=model_id)`
Can someone help me?
Expected behavior
Prediction in the form of a single number from the model
cc @Wauplin
@sooryansatheesh In your reproducible script above
from huggingface_hub import InferenceClient
input_data = [2, 3, 4, 2, 4]
df = pd.DataFrame([input_data], columns=cols_used)
client = InferenceClient()
table = df.to_dict(orient="records")
print(table)
client.tabular_classification(table=table, model=model_id)
would you mind sharing what values you used for cols_used
and model_id
? Without it, it's hard to reproduce.
In general HTTP 429 means you got rate limited. Using an hf token should lift the rate limit up which might solve your situation. Another possibility is that your model doesn't load on our inference API servers but to investigate that we would need the model id.
@Wauplin thanks for your work. Unfortunately, this week I encountered a similar issue multiple times.
While authenticated, I tried to use google/gemma-1.1-2b-it
(I have access) some days ago.
The model was probably loading, so I interrupted the request after several minutes and then I got 429 and was blocked for an hour.
The same happened today with Qwen/Qwen2-7B-Instruct-AWQ
.
Is it a known issue? Is there a way to raise an error in these cases and avoid hitting the rate limit?
Hi @anakin87, thanks for reporting. To improve user experience, I opened https://github.com/huggingface/huggingface_hub/pull/2318 which will add X-wait-for-model
as header. This way InferenceClient
won't send requests every second until model is loaded.