blog icon indicating copy to clipboard operation
blog copied to clipboard

Error when fine tuning whisper model

Open valaofficial opened this issue 2 years ago • 6 comments

I followed this blog post and used it to fine tune the whisper model using a custom data set, but after training when trying to run this command

trainer.push_to_hub(**kwargs)

it throws this error

HTTPError                                 Traceback (most recent call last)
 
[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
    260     try:
--> 261         response.raise_for_status()
    262     except HTTPError as e:

 9 frames
HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/models/valacodes/whisper-small-hausa/commit/main

The above exception was the direct cause of the following exception:

BadRequestError                           Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
    297                 f"\n\nBad request for {endpoint_name} endpoint:" if endpoint_name is not None else "\n\nBad request:"
    298             )
--> 299             raise BadRequestError(message, response=response) from e
    300 
    301         # Convert HTTPError into a HfHubHTTPError to display request information

BadRequestError:  (Request ID: Root=1-65271724-79a6b33830e49217395944e2;736a08e9-3998-4e6f-b43e-86df049f04ed)

Bad request for commit endpoint:
"model-index[0].results[0].dataset.config" must be a string

and visiting the hf-speech-bench webpage shows this

TypeError: string indices must be integers
Traceback:
File "/home/user/.local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
    exec(code, module.__dict__)
File "/home/user/app/app.py", line 143, in <module>
    dataframe = get_data()
File "/home/user/.local/lib/python3.10/site-packages/streamlit/runtime/legacy_caching/caching.py", line 715, in wrapped_func
    return get_or_create_cached_value()
File "/home/user/.local/lib/python3.10/site-packages/streamlit/runtime/legacy_caching/caching.py", line 696, in get_or_create_cached_value
    return_value = non_optional_func(*args, **kwargs)
File "/home/user/app/app.py", line 107, in get_data
    for row in parse_metrics_rows(meta):
File "/home/user/app/app.py", line 72, in parse_metrics_rows
    lang = result["dataset"]["args"]["language"]
```

valaofficial avatar Oct 11 '23 22:10 valaofficial

@valaofficial I have the same error, any news?

lombardata avatar Nov 07 '23 12:11 lombardata

cc @sanchit-gandhi :)

pcuenca avatar Nov 07 '23 14:11 pcuenca

I came here after having some issues figuring out kwargs and what was expected through the push_to_hub method.

However I did manage to publish the model and use it with gradio by adding the following:

trainer.save_model() 
trainer.push_to_hub() 
tokenizer.push_to_hub("username/model-id") 

Doc Links

garethpaul avatar Nov 19 '23 01:11 garethpaul

Has anyone found a solution to the issue? I am experiencing the same problem.

Mirodil avatar Nov 22 '23 02:11 Mirodil

I assume you're also following this notebook: https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb. Commenting out "dataset_tags" from **kwargs worked for me, although I'm not sure why

halannhile avatar Dec 12 '23 00:12 halannhile

same problem when I follow the Audio Course Unit 4.

SpellingDragon avatar Dec 24 '23 16:12 SpellingDragon