blog icon indicating copy to clipboard operation
blog copied to clipboard

Error with the huggingface hf-speech-bench

Open valaofficial opened this issue 2 years ago • 2 comments

I followed this blog post and used it to fine tune the whisper model using a custom data set, but after training when trying to run this command

trainer.push_to_hub(**kwargs)

it throws this error

HTTPError                                 Traceback (most recent call last)
 
[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
    260     try:
--> 261         response.raise_for_status()
    262     except HTTPError as e:

 9 frames
HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/models/valacodes/whisper-small-hausa/commit/main

The above exception was the direct cause of the following exception:

BadRequestError                           Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
    297                 f"\n\nBad request for {endpoint_name} endpoint:" if endpoint_name is not None else "\n\nBad request:"
    298             )
--> 299             raise BadRequestError(message, response=response) from e
    300 
    301         # Convert HTTPError into a HfHubHTTPError to display request information

BadRequestError:  (Request ID: Root=1-65271724-79a6b33830e49217395944e2;736a08e9-3998-4e6f-b43e-86df049f04ed)

Bad request for commit endpoint:
"model-index[0].results[0].dataset.config" must be a string

and visiting the hf-speech-bench webpage shows this

TypeError: string indices must be integers
Traceback:
File "/home/user/.local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
    exec(code, module.__dict__)
File "/home/user/app/app.py", line 143, in <module>
    dataframe = get_data()
File "/home/user/.local/lib/python3.10/site-packages/streamlit/runtime/legacy_caching/caching.py", line 715, in wrapped_func
    return get_or_create_cached_value()
File "/home/user/.local/lib/python3.10/site-packages/streamlit/runtime/legacy_caching/caching.py", line 696, in get_or_create_cached_value
    return_value = non_optional_func(*args, **kwargs)
File "/home/user/app/app.py", line 107, in get_data
    for row in parse_metrics_rows(meta):
File "/home/user/app/app.py", line 72, in parse_metrics_rows
    lang = result["dataset"]["args"]["language"]
```

valaofficial avatar Oct 30 '23 17:10 valaofficial

I am having same issue.

ahmed8047762 avatar Nov 02 '23 20:11 ahmed8047762

I was having the same issue and the reason was something that looked unrelated: apparently there's a bug when fetching the dataset metadata if you're using multiple processors on any function applied on the dataset (like map or filter). That ended up failing the push_to_hub parse on the dataset information. The discussion happened here and I there's a PR on the way.

In the meantime, it's working for me when using num_proc=1 on my maps and filters. Not sure if it also works if you remove the dataset information all together from the **kwargs that is used on trainer.push_to_hub(**kwargs). So you can give that a shot as well.

thiagobarbosa avatar Jan 15 '24 07:01 thiagobarbosa