llama
llama copied to clipboard
Take too much time to load the model
It takes too much time to load the model . For example, setting batch size =1, It will take about 252.89 and 880s to load llama-13b and llama-30b, respectively. Are there faster approaches?
Second load may be faster.
make sure to cache / store weights on a fast NVME drive before loading