llama.cpp
llama.cpp copied to clipboard
llama : model-based max number of graph nodes
fix #8615
Propose to determine the max number of nodes based on the model info (arch, hparams, etc.)
- [x] I have read the contributing guidelines
- Self-reported review complexity:
- [ ] Low
- [ ] Medium
- [ ] High
Ok, will merge this after the 405B model is release and the need for this change is confirmed. Likely the proposed n_layer > 400 check would have to be updated, because this number seems too big to me
@ggerganov what's left? up to now?
I haven't noticed any reports of 405B failing, so removed the increased max nodes limit for now and planning to merge just the new llama_model_max_nodes function