mamba
mamba copied to clipboard
clarification on how to interpret kernel size for conv1d
Can we interpret the convolution kernel size as the context length? Would increasing kernel size allows longer range context?