Matthijs Hollemans comments

Results 233 comments of


                                            Matthijs Hollemans

[Whisper] Word level and character level timestamps

I've investigated adding word-level timestamps to Transformers using the OpenAI approach of using the cross-attention weights. Preliminary results can be found in this Colab: https://colab.research.google.com/drive/1VWbAgzKWQsStdAA1hcumBU2uyFQX7zAB?usp=sharing

[Whisper] Word level and character level timestamps

Closed by https://github.com/huggingface/transformers/pull/23205

[Whisper] Word level and character level timestamps

@upskyy You may need to use different `attention_heads` on the fine-tuned model. See also: https://gist.github.com/hollance/42e32852f24243b748ae6bc1f985b13a

what if the custom layer's input is five dimensional

All CoreML tensors are 5-dimensional, but I'm not really sure what happens when the first two dimensions (batch size and sequence length) are not actually 1.

what if the custom layer's input is five dimensional

The texture is actually a texture_array, which has multiple texture slices. Each slice has 4 channels.

what if the custom layer's input is five dimensional

No, you just need the one `texture2d_array inTexture [[texture(0)]]`. Notice that its type is `texture2d_array`, not `texture2d`. When you read a pixel, you also specify the slice to read from:...

what if the custom layer's input is five dimensional

Almost. Z would go from 0 to 32, since you need to divide the number of channels by 4 (because each texture slice contains 4 channels).

Cannot reproduce the results from the project page

Hi @nikhilsinghmus, thanks for looking into this! Unfortunately the Replicate website says, "Uh oh! This model can't be run on Replicate because it was built with a version of Cog...

Cannot reproduce the results from the project page

Cheers, I'll have a look. :-) BTW, to test the model with my script I literally used the images from your project page at https://web.media.mit.edu/~nsingh1/image2reverb/

Cannot reproduce the results from the project page

Hi @nikhilsinghmus, that's awesome. Thanks so much for looking into this! Do you think that maybe Trainer sets the random seed to a specific value, and that the model depends...