mamba
mamba copied to clipboard
Wikitext pipeline
Hi, can you please share pipeline for the wikitext dataset. I found results with 16.3 for mamba and 18 (vs. 18.6 everywhere else) perplexity for the transformer baseline and can not reproduce it. Maybe there is something different in preprocessing etc. Could you provide any details on the preprocessing steps or hyperparameters used that may be different from the default? Understanding those differences could help me reproduce the results.