mamba
mamba copied to clipboard
Does mamba support data packing?
To accelerate training, it is common practice to pack many text sequences into the same sentence. Self-attention avoids sample contamination with causal mask/sequence IDs. I do not see a similar mechanism built for mamba. Does mamba support data packing?