exo
exo copied to clipboard
[BOUNTY - $100] Parallelise Model Loading
- Right now we already parallelise model downloads which works great and speeds things up a lot
- However, loading the model into memory can also be slow for large models, and right now exo will do this sequentially one node at a time
- In a similar way to model downloads, we should parallelise model loading into memory
- This should probably expose some functionality of
ensure_shardin the abstractInferenceEngine, maybe something likepreload_model
@AlexCheema submitted PR https://github.com/exo-explore/exo/pull/211 Hope it resolves your issue
If you feel like supporting me:
https://buymeacoffee.com/aybanda
tried to take a turn at this #360 lmk if there are any req changes
Is this bounty still open?