TinyLlama
TinyLlama copied to clipboard
Reference for pretraining other small language models
The README mentions this codebase can act as a "reference for enthusiasts keen on pretraining language models under 5 billion parameters". I'm wondering if you could give a brief guide on how to do so, assuming we start from a transformers config and tokenizer. Something like:
{
"architectures": [
"..."
],
...
"model_type": "...",
"num_hidden_layers": 12,
...
}
Is a lot of work required to change the codebase to support this?
https://github.com/jzhang38/TinyLlama/blob/main/lit_gpt/config.py
You can pick one config from here or create your own config.