optimum
optimum copied to clipboard
Add Param Cache For Recompilation
The parameter cache instance is needed to handle recompilation where we need to make sure the parameters we created in the first run are used, currently the use case does not fall into error even without param cache because we directly replace layers in layer cache in recompilation(parameters are replaced automatically because layers are replaced), but there are still some parameters which is not traced within a standalone module(like layernorm weight), it still works fine for now because we directly use the original parameter instead of creating new ones for initialization and weights loading if it is already on the current rank device, however, in cases where we need to support third-party backends like nanotron which has its own implementation of NanotronParameter
, we do need to track all the newly created parameters so that no new parameter is created in recompilation.