exo
exo copied to clipboard
feat: Parallelise Model Loading
test using
exo --preload-models llama-3.2-1b,llama-3.1-8b
Almost what I envisioned - only thing I would change is to preload after the preemptive download. We don't want to download all possible model shards, only the relevant one.
@AlexCheema I think I got it, can you review the changes ??
tested on a m3 pro
closed this one because it has been >month for the last pr, synced my fork and the new pr is at #466