AMDMIGraphX
AMDMIGraphX copied to clipboard
Weight Stripping
Enable creating engines (currently, MXR files, eventually perhaps dynamic objects) without embedding the weights in the engine.
Use cases: (1) Support compilation for various batch sizes without duplicating the weights. (2) Support multiple execution configurations with different quantization options (including mixed precision), without necessarily having to embed the weights in all the created engines. (3) multi-GPU execution may benefit from this also, especially when it comes to creating multiple multiGPU execution configurations (partitions, execution schedules)
Technical considerations: How do we treat literals? Perhaps we need to have the MXR files contain the steps required to recreate the literals from the weights' file, and that may require a new type ( finalized lliterals vs future literal or meta-literal)