exo icon indicating copy to clipboard operation
exo copied to clipboard

[BOUNTY - $100] Parallelise Model Loading

Open AlexCheema opened this issue 1 year ago • 3 comments

  • Right now we already parallelise model downloads which works great and speeds things up a lot
  • However, loading the model into memory can also be slow for large models, and right now exo will do this sequentially one node at a time
  • In a similar way to model downloads, we should parallelise model loading into memory
  • This should probably expose some functionality of ensure_shard in the abstract InferenceEngine, maybe something like preload_model

AlexCheema avatar Sep 05 '24 15:09 AlexCheema

@AlexCheema submitted PR https://github.com/exo-explore/exo/pull/211 Hope it resolves your issue

If you feel like supporting me:

https://buymeacoffee.com/aybanda

aybanda avatar Sep 09 '24 11:09 aybanda

tried to take a turn at this #360 lmk if there are any req changes

vovw avatar Oct 17 '24 21:10 vovw

Is this bounty still open?

minhdv82 avatar Nov 12 '24 15:11 minhdv82