Nikita Balagansky
Nikita Balagansky
Hi, @marcusau! In this repository, I use the dataset for masked language modelling task. The example in the `data` folder is just for fast testing in CI. **So, for now,...
Well, can you tell about your task more specific? What is the language of your dataset? What model are you going to use as a teacher? If you need your...
Okay, you can pass something like this: https://gist.github.com/elephantmipt/4287f5792a4c1e716d2f62db623646cf . Don't forget to specify path to your dataset and text_field in config above. You can run it with `catalyst-dl run -C...
Thank you for the detailed reply! I found that the smallest mamba-130m model uses 24 layers instead of 12, according to the [config](). Is this a case for the wikitext...
Thank you for clarifying! I found it confusing that the README file mentions a 12-layer model, https://github.com/state-spaces/mamba/blob/2ee7fd287a8f5c826af6f69ae3aad4682c4afd15/README.md?plain=1#L85 while on Hugging Face, there is a 24-layer model.
> The README mentions the double layer count right below the table, do you have a suggestion for a presentation that would be more clear? I think 96ec4e4 solved all...