efficientvit
efficientvit copied to clipboard
Brief question about model structure
We know that for qkv attention, the result of q @ k should be divided by sqrt(d), will this also be same for efficientVit?
Does relu-based-linear-attention need layernorm or position embedding?
Does relu-based-linear-attention need multi-head attention?