Matthijs Hollemans
Matthijs Hollemans
I've investigated adding word-level timestamps to Transformers using the OpenAI approach of using the cross-attention weights. Preliminary results can be found in this Colab: https://colab.research.google.com/drive/1VWbAgzKWQsStdAA1hcumBU2uyFQX7zAB?usp=sharing
Closed by https://github.com/huggingface/transformers/pull/23205
@upskyy You may need to use different `attention_heads` on the fine-tuned model. See also: https://gist.github.com/hollance/42e32852f24243b748ae6bc1f985b13a
All CoreML tensors are 5-dimensional, but I'm not really sure what happens when the first two dimensions (batch size and sequence length) are not actually 1.
The texture is actually a texture_array, which has multiple texture slices. Each slice has 4 channels.
No, you just need the one `texture2d_array inTexture [[texture(0)]]`. Notice that its type is `texture2d_array`, not `texture2d`. When you read a pixel, you also specify the slice to read from:...
Almost. Z would go from 0 to 32, since you need to divide the number of channels by 4 (because each texture slice contains 4 channels).
Hi @nikhilsinghmus, thanks for looking into this! Unfortunately the Replicate website says, "Uh oh! This model can't be run on Replicate because it was built with a version of Cog...
Cheers, I'll have a look. :-) BTW, to test the model with my script I literally used the images from your project page at https://web.media.mit.edu/~nsingh1/image2reverb/
Hi @nikhilsinghmus, that's awesome. Thanks so much for looking into this! Do you think that maybe Trainer sets the random seed to a specific value, and that the model depends...