openvino icon indicating copy to clipboard operation
openvino copied to clipboard

Only clone weights when DAZ is not set

Open usstq opened this issue 2 years ago • 4 comments

Details:

  • Optimize weights memory usage by avoiding cloning weights with sub-normals when DAZ is set
  • Introduce a util function is_denormals_as_zero_set()
  • fix duplicate definitions issue by separate denormals.hpp into a hpp and cpp

Tickets:

  • CVS-82367

usstq avatar Jul 27 '22 07:07 usstq

@maxnick Hi, Can you please review this PR?

usstq avatar Aug 10 '22 07:08 usstq

As we discussed on the sync: lets use state of CPU_DENORMALS_OPTIMIZATION flag as a trigger for weights subnormals handling logic.

dmitry-gorokhov avatar Sep 12 '22 08:09 dmitry-gorokhov

@dmitry-gorokhov @maxnick we will have difficulty when CPU_DENORMALS_OPTIMIZATION wasn't set and internal flag denormalsOptMode will be DenormalsOptMode::DO_Keep and we have no way to tell whether DAZ is set other than checking CPU MSR directly. @maxnick Any suggestion?

usstq avatar Sep 16 '22 03:09 usstq

@dmitry-gorokhov @maxnick we will have difficulty when CPU_DENORMALS_OPTIMIZATION wasn't set and internal flag denormalsOptMode will be DenormalsOptMode::DO_Keep and we have no way to tell whether DAZ is set other than checking CPU MSR directly. @maxnick Any suggestion?

Still not sure we actually need that DenormalsOptMode::DO_Keep mode. As far as I understand it is for mimic the existing behavior with nullifying denormals in the input constants. Right? In that case we should make the denormal nullifying pass, don't we? Do I correctly understant that the plugin will set the DAZ and FTZ flags only in case DenormalsOptMode::Do_on, than we can skip the pass in case DenormalsOptMode::Do_on, because the flags are set, and in case DenormalsOptMode::Do_off in case we want to process all denormals as is.

maxnick avatar Sep 22 '22 09:09 maxnick

@maxnick I find that it's difficult to get graph's config inside constructor of Input node(for checking CPU_DENORMALS_OPTIMIZATION flag), actually it's hard to add any new input parameter into constructor of Node or Input w/o massively changing the prototype of all Node's constructor.

Thus I suggest to add one more generic tuple-like (or simply struct) parameter (like NodeRuntime I proposed in another PR ) into Node's constructor for passing all additional configurations to node's constructor (other than existing ones), it would be easier to expand it in the further, what do you think or any other suggestion?

usstq avatar Sep 26 '22 06:09 usstq

@maxnick I find that it's difficult to get graph's config inside constructor of Input node(for checking CPU_DENORMALS_OPTIMIZATION flag), actually it's hard to add any new input parameter into constructor of Node or Input w/o massively changing the prototype of all Node's constructor.

Thus I suggest to add one more generic tuple-like (or simply struct) parameter (like NodeRuntime I proposed in another PR ) into Node's constructor for passing all additional configurations to node's constructor (other than existing ones), it would be easier to expand it in the further, what do you think or any other suggestion?

@usstq , the time has come. Since we introduced the GraphContext class has been introduced and now all the nodes have access to the graph parameters, all the concerns above may be addressed.

maxnick avatar Mar 02 '23 18:03 maxnick

@maxnick I changed the implementation according to config, please review again, Thx!

usstq avatar Mar 13 '23 06:03 usstq

@maxnick BTW, I found that even when denormalsOptMode is DO_On, the denormals_as_zero(true) may return false on some cases, in which no DAZ flag can be set, so I need to record the return value of this function and use it to decide whether a manual flushing of denormals to zero is required.

usstq avatar Mar 13 '23 07:03 usstq