blog
blog copied to clipboard
Error with the huggingface hf-speech-bench
I followed this blog post and used it to fine tune the whisper model using a custom data set, but after training when trying to run this command
trainer.push_to_hub(**kwargs)
it throws this error
HTTPError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
260 try:
--> 261 response.raise_for_status()
262 except HTTPError as e:
9 frames
HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/models/valacodes/whisper-small-hausa/commit/main
The above exception was the direct cause of the following exception:
BadRequestError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
297 f"\n\nBad request for {endpoint_name} endpoint:" if endpoint_name is not None else "\n\nBad request:"
298 )
--> 299 raise BadRequestError(message, response=response) from e
300
301 # Convert HTTPError into a HfHubHTTPError to display request information
BadRequestError: (Request ID: Root=1-65271724-79a6b33830e49217395944e2;736a08e9-3998-4e6f-b43e-86df049f04ed)
Bad request for commit endpoint:
"model-index[0].results[0].dataset.config" must be a string
and visiting the hf-speech-bench webpage shows this
TypeError: string indices must be integers
Traceback:
File "/home/user/.local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.__dict__)
File "/home/user/app/app.py", line 143, in <module>
dataframe = get_data()
File "/home/user/.local/lib/python3.10/site-packages/streamlit/runtime/legacy_caching/caching.py", line 715, in wrapped_func
return get_or_create_cached_value()
File "/home/user/.local/lib/python3.10/site-packages/streamlit/runtime/legacy_caching/caching.py", line 696, in get_or_create_cached_value
return_value = non_optional_func(*args, **kwargs)
File "/home/user/app/app.py", line 107, in get_data
for row in parse_metrics_rows(meta):
File "/home/user/app/app.py", line 72, in parse_metrics_rows
lang = result["dataset"]["args"]["language"]
```
I am having same issue.
I was having the same issue and the reason was something that looked unrelated: apparently there's a bug when fetching the dataset metadata if you're using multiple processors on any function applied on the dataset (like map or filter).
That ended up failing the push_to_hub parse on the dataset information.
The discussion happened here and I there's a PR on the way.
In the meantime, it's working for me when using num_proc=1 on my maps and filters. Not sure if it also works if you remove the dataset information all together from the **kwargs that is used on trainer.push_to_hub(**kwargs). So you can give that a shot as well.