openvino
openvino copied to clipboard
Only clone weights when DAZ is not set
Details:
- Optimize weights memory usage by avoiding cloning weights with sub-normals when DAZ is set
- Introduce a util function
is_denormals_as_zero_set()
- fix duplicate definitions issue by separate
denormals.hpp
into a hpp and cpp
Tickets:
- CVS-82367
@maxnick Hi, Can you please review this PR?
As we discussed on the sync: lets use state of CPU_DENORMALS_OPTIMIZATION flag as a trigger for weights subnormals handling logic.
@dmitry-gorokhov @maxnick we will have difficulty when CPU_DENORMALS_OPTIMIZATION wasn't set and internal flag denormalsOptMode
will be DenormalsOptMode::DO_Keep
and we have no way to tell whether DAZ is set other than checking CPU MSR directly. @maxnick Any suggestion?
@dmitry-gorokhov @maxnick we will have difficulty when CPU_DENORMALS_OPTIMIZATION wasn't set and internal flag
denormalsOptMode
will beDenormalsOptMode::DO_Keep
and we have no way to tell whether DAZ is set other than checking CPU MSR directly. @maxnick Any suggestion?
Still not sure we actually need that DenormalsOptMode::DO_Keep
mode. As far as I understand it is for mimic the existing behavior with nullifying denormals in the input constants. Right? In that case we should make the denormal nullifying pass, don't we? Do I correctly understant that the plugin will set the DAZ and FTZ flags only in case DenormalsOptMode::Do_on
, than we can skip the pass in case DenormalsOptMode::Do_on
, because the flags are set, and in case DenormalsOptMode::Do_off
in case we want to process all denormals as is.
@maxnick I find that it's difficult to get graph's config inside constructor of Input node(for checking CPU_DENORMALS_OPTIMIZATION flag), actually it's hard to add any new input parameter into constructor of Node or Input w/o massively changing the prototype of all Node's constructor.
Thus I suggest to add one more generic tuple-like (or simply struct) parameter (like NodeRuntime
I proposed in another PR ) into Node's constructor for passing all additional configurations to node's constructor (other than existing ones), it would be easier to expand it in the further, what do you think or any other suggestion?
@maxnick I find that it's difficult to get graph's config inside constructor of Input node(for checking CPU_DENORMALS_OPTIMIZATION flag), actually it's hard to add any new input parameter into constructor of Node or Input w/o massively changing the prototype of all Node's constructor.
Thus I suggest to add one more generic tuple-like (or simply struct) parameter (like
NodeRuntime
I proposed in another PR ) into Node's constructor for passing all additional configurations to node's constructor (other than existing ones), it would be easier to expand it in the further, what do you think or any other suggestion?
@usstq , the time has come. Since we introduced the GraphContext
class has been introduced and now all the nodes have access to the graph parameters, all the concerns above may be addressed.
@maxnick I changed the implementation according to config, please review again, Thx!
@maxnick BTW, I found that even when denormalsOptMode
is DO_On
, the denormals_as_zero(true)
may return false on some cases, in which no DAZ flag can be set, so I need to record the return value of this function and use it to decide whether a manual flushing of denormals to zero is required.