aft-pytorch icon indicating copy to clipboard operation
aft-pytorch copied to clipboard

Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

Results 3 aft-pytorch issues
Sort by recently updated
recently updated
newest added

RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for baddbmm) i set .cuda()...

I use aft_full model,6 layers. and I use it in init with this code: ``` self.encoder_transformer = nn.ModuleList() for _ in range(6): self.encoder_transformer.append(AFTFull(max_seqlen=500, dim=512,hidden_dim=256)) ``` and in forward function, I...

I want to migrate a existing llm to this arch. There is a additional param w. How to init it?