Matthijs Hollemans

Results 233 comments of Matthijs Hollemans

> I'm not too sure why I'm asked for a review here as all comments from @sanchit-gandhi are being ignored. No they aren't?! I've integrated most of his suggestions and...

Tokenizer can now handle both the original VITS models (which require phonemization) and the MMS-TTS models.

IIRC we decided for the time being to keep using `forced_decoder_ids` for the prompts, even though it's slower indeed. Would be nice to improve this.

I don't understand this part of the generation process well enough yet to say anything useful about it. You'd think that we could start generation by passing in the entire...

@Narsil For Whisper, we want to start generation not with a single "BOS" token (here, ``) but with several tokens. In the case of prompting, this could be a fairly...

There is an [open PR](https://github.com/huggingface/transformers/pull/15773) for FastSpeech2. I think this is a good new model to add. If anyone is interested in taking that PR to completion, that would be...

I cleaned up `hertz_to_mel` and `mel_to_hertz` a bit: - more consistent doc comments - both support single float inputs as well as numpy arrays - simplified the formulas so it's...

I rewrote `power_to_db` and added `amplitude_to_db`. They still work like the librosa versions but with argument names that make more sense to me.

Changed `get_mel_filter_banks` into `mel_filter_bank`. Mostly renamed arguments and variables and cleaned up the doc comments, so that the naming is more in line with the rest of Transformers, e.g. `num_frequency_bins`...

Pushed significant changes to the `stft` code. - Removed `fram_wave`; this is really an implementation detail that should happen inside the STFT. - The new `stft` gives the same results...