open_llama
open_llama copied to clipboard
What learning rate was used to pretrain 3B model?
trafficstars