AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Weight Stripping

Open hgaspar opened this issue 8 months ago • 7 comments

Enable creating engines (currently, MXR files, eventually perhaps dynamic objects) without embedding the weights in the engine.

Use cases: (1) Support compilation for various batch sizes without duplicating the weights. (2) Support multiple execution configurations with different quantization options (including mixed precision), without necessarily having to embed the weights in all the created engines. (3) multi-GPU execution may benefit from this also, especially when it comes to creating multiple multiGPU execution configurations (partitions, execution schedules)

Technical considerations: How do we treat literals? Perhaps we need to have the MXR files contain the steps required to recreate the literals from the weights' file, and that may require a new type ( finalized lliterals vs future literal or meta-literal) 

hgaspar avatar Jun 21 '24 10:06 hgaspar