torchscale About running speed

About running speed

Open NieShenRuc opened this issue 1 year ago • 0 comments

Thanks for your excellent work! I have mentioned that torchscale serially executes the operation of mapping x to q, k, and v, in line 84~86 in file torchscale/component/multihead_attention.py. Will this be slower in your approach compared to doing it in parallel? For example, self.qkv_proj=nn.Linear(embed_dim, 3 * embed_dim)

Mar 28 '23 07:03 NieShenRuc

torchscale torchscale copied to clipboard

About running speed

torchscale
torchscale copied to clipboard