torchscale
torchscale copied to clipboard
Where is the offset implemented in Multi-head dilated attention ?