[BOUNTY - $100] Parallelise Model Loading

Open AlexCheema opened this issue 1 year ago • 3 comments

Right now we already parallelise model downloads which works great and speeds things up a lot
However, loading the model into memory can also be slow for large models, and right now exo will do this sequentially one node at a time
In a similar way to model downloads, we should parallelise model loading into memory
This should probably expose some functionality of ensure_shard in the abstract InferenceEngine, maybe something like preload_model

Sep 05 '24 15:09 AlexCheema

@AlexCheema submitted PR https://github.com/exo-explore/exo/pull/211 Hope it resolves your issue

If you feel like supporting me:

https://buymeacoffee.com/aybanda

Sep 09 '24 11:09 aybanda

tried to take a turn at this #360 lmk if there are any req changes

Oct 17 '24 21:10 vovw

Is this bounty still open?

Nov 12 '24 15:11 minhdv82