feat: Parallelise Model Loading

Open vovw opened this issue 1 year ago • 3 comments

test using

exo --preload-models llama-3.2-1b,llama-3.1-8b

Oct 17 '24 21:10 vovw

Almost what I envisioned - only thing I would change is to preload after the preemptive download. We don't want to download all possible model shards, only the relevant one.

Oct 18 '24 01:10 AlexCheema

@AlexCheema I think I got it, can you review the changes ??

Oct 18 '24 09:10 vovw

tested on a m3 pro

Oct 19 '24 23:10 vovw

closed this one because it has been >month for the last pr, synced my fork and the new pr is at #466

Nov 16 '24 15:11 vovw