AMDMIGraphX
AMDMIGraphX copied to clipboard
Add weight streaming at runtime
Figure out a way to have weight streaming at runtime i.e. be able to fit large models on gpu without needing to know literal size ahead of time
- [x] Define/determine an allocation of literals to be streamed
- [x] Move copy instructions to separate stream
- [x] Investigate why
@literal
instructions take up so much time - [x] Decrease time spent on
@literal
instruction - [ ] Move from naive allocation to "smart" allocation (figure out best way to mask time taken to copy)