Matthijs Hollemans comments

Results 233 comments of


                                            Matthijs Hollemans

add VITS model

> I'm not too sure why I'm asked for a review here as all comments from @sanchit-gandhi are being ignored. No they aren't?! I've integrated most of his suggestions and...

add VITS model

Tokenizer can now handle both the original VITS models (which require phonemization) and the MMS-TTS models.

forced_decoder_ids in Whisper models significantly impacts performance, use decoder_input_ids instead

IIRC we decided for the time being to keep using `forced_decoder_ids` for the prompts, even though it's slower indeed. Would be nice to improve this.

forced_decoder_ids in Whisper models significantly impacts performance, use decoder_input_ids instead

I don't understand this part of the generation process well enough yet to say anything useful about it. You'd think that we could start generation by passing in the entire...

forced_decoder_ids in Whisper models significantly impacts performance, use decoder_input_ids instead

@Narsil For Whisper, we want to start generation not with a single "BOS" token (here, ``) but with several tokens. In the case of prompting, this could be a fairly...

Support `text-to-speech` in `pipeline` function and in Optimum

There is an [open PR](https://github.com/huggingface/transformers/pull/15773) for FastSpeech2. I think this is a good new model to add. If anyone is interested in taking that PR to completion, that would be...

audio_utils improvements

I cleaned up `hertz_to_mel` and `mel_to_hertz` a bit: - more consistent doc comments - both support single float inputs as well as numpy arrays - simplified the formulas so it's...

audio_utils improvements

I rewrote `power_to_db` and added `amplitude_to_db`. They still work like the librosa versions but with argument names that make more sense to me.

audio_utils improvements

Changed `get_mel_filter_banks` into `mel_filter_bank`. Mostly renamed arguments and variables and cleaned up the doc comments, so that the naming is more in line with the rest of Transformers, e.g. `num_frequency_bins`...

audio_utils improvements

Pushed significant changes to the `stft` code. - Removed `fram_wave`; this is really an implementation detail that should happen inside the STFT. - The new `stft` gives the same results...