Pikauba

Results 9 comments of Pikauba

I actually observed the same issue @black-puppydog. I was wondering why the regularization term eye() was so big. I believe that mathematically it's a mistake to let it how it...

The addition acts like a non-linearity already. It's a design choice, you can see the same in Cycle-GAN and pix-to-pixHD. It actually allows it to output negative values.

As described in your link, "In 2.2, the legacy global backend mechanism will be removed. Utility functions [get_audio_backend()](https://pytorch.org/audio/stable/generated/torchaudio.get_audio_backend.html#torchaudio.get_audio_backend) and [set_audio_backend()](https://pytorch.org/audio/stable/generated/torchaudio.set_audio_backend.html#torchaudio.set_audio_backend) become no-op." Considering that for now pyannote-audio has a requirement...

Be careful about this so called [Fix](https://github.com/Vaibhavs10/insanely-fast-whisper/blob/355275fe7c05578a1c948452ff063f60a9670cc6/src/insanely_fast_whisper/utils/diarize.py#L147C16-L147C30). As it is the same exact code used in speechbox (I wonder why the speechbox library is not directly integrated in this repo...

I adapted the PowerIteration method to fit with batch matrix and eigenvectors if you are interested. ```python class PowerIteration(torch.autograd.Function): @staticmethod def forward(ctx, M, v, n_iter=19): ctx.n_iter = n_iter ctx.save_for_backward(M, v)...

Thank you for pointing that out! I looked at the code and wonder if obtaining the confidence per token once the whole auto-regression process is completed by applying a feed-forward...

Ok! It makes sense with the informations contained in the paper. Indeed as I explained, `--vad_min_duration_off`will let the user choose how to deal with short duration silences and will leave...

You could apply STFT (if you have a non-stationnary signal) to your signal and use the resulting 2D Signal-Time and amplitude "images". Also, it appears that there is a 1D...

As my issue was closed due to duplicated and it seems like it's not been answered clearly here. What is the fundamental choice of forcing the model to predict flow...