pfeatherstone
pfeatherstone
I have to say, the results were really poor. The better option was to concert my 1D data into 2D by using torch.stft() then using a normal 2D model...
Tried this again optimistically thinking I would get better results. Nope...
I missed the `== 0`. You could just use; ``` bias = torch.triu(torch.ones(5, 5), diagonal=1) attn = attn.masked_fill(bias[:,:,:T,:T], float('-inf')) ```
Very interested in this. I'm training two models at once and can only use batch sizes of less than 5 on my machine... So gradient accumulation would be great
I did have a branch at some point in the past that cleaned up all of dlib's cmake stuff including CUDA. I can try to revive that at some point.
I could be naive here, is there a reason why Layernorm isn't using CUDNN ? Will `cudnnNormalizationForwardInference`, `cudnnNormalizationForwardTraining` and `cudnnNormalizationBackward` work ? It looks like those functions can be used...
Somewhere in the docs I read it could be used for multiple types of normalization... I agree, it's hard to believe cudnn doesn't have first class support for it. Maybe...
In fact i have a 1d complex tensor disguised as a 2d tensor. And i want to extract the real and imaginary parts of both the even and odd samples....
do a reshape first to go from [B,2] to [B//2,2,2] then slice appropriately with some flattening if required
@nicolaspanel I'm stuck on this again. Can you help?