llama2.c
llama2.c copied to clipboard
I found that the dim parameter affects the learning loss and n_layers affects the training speed.
I found that the dim parameter affects the learning loss and n_layers affects the training speed.
It took 30 minutes.
The larger layer only had a loss of 2, but it took 3 hours.