mamba
mamba copied to clipboard
clarification on how to interpret kernel size for conv1d
trafficstars
Can we interpret the convolution kernel size as the context length? Would increasing kernel size allows longer range context?
No, the context length is whatever sequence length you use as inputs. We typically use kernel size 2, 3, 4 for the conv1d