raven icon indicating copy to clipboard operation
raven copied to clipboard

[DEFECT] ROM "info" is not used by most of the ROMs

Open aalfonsi opened this issue 3 years ago • 4 comments

Thank you for the defect report

Defect Description

the ROM info that was supposed to trigger the "normalization" or not of the training data in the supervised learning is not used generically. Only the skLearn roms use it in the SkLearn base class.

Steps to Reproduce

N/A

Expected Behavior

if I set info['normalize'] = True, the training data get normalized.

Screenshots and Input Files

No response

OS

MacOS

OS Version

No response

Dependency Manager

CONDA

For Change Control Board: Issue Review

  • [ ] Is it tagged with a type: defect or task?
  • [ ] Is it tagged with a priority: critical, normal or minor?
  • [ ] If it will impact requirements or requirements tests, is it tagged with requirements?
  • [ ] If it is a defect, can it cause wrong results for users? If so an email needs to be sent to the users.
  • [ ] Is a rationale provided? (Such as explaining why the improvement is needed or why current code is wrong.)

For Change Control Board: Issue Closure

  • [ ] If the issue is a defect, is the defect fixed?
  • [ ] If the issue is a defect, is the defect tested for in the regression test system? (If not explain why not.)
  • [ ] If the issue can impact users, has an email to the users group been written (the email should specify if the defect impacts stable or master)?
  • [ ] If the issue is a defect, does it impact the latest release branch? If yes, is there any issue tagged with release (create if needed)?
  • [ ] If the issue is being closed without a pull request, has an explanation of why it is being closed been provided?

aalfonsi avatar Apr 18 '22 17:04 aalfonsi

@aalfonsi FYI, I have checked the source code, the default behavior for supervised learning algorithms is to normalize the feature space. In the scikit-learn based rom, we override the normalization function to use the info dict to determine whether the rom need to be normalized or not. As you said, the info dict is not yet used by non-sklearn rom. The following are my suggestion, we can extend our rom input to allow user to enable or disable the normalization.

wangcj05 avatar Apr 21 '22 22:04 wangcj05

Yes. I agree with your idea. For the normalization, the _localNormalizeData is always called, but the only why to "bypass" it is to overload it (instead the info dictionary should be used). Probably also its scope (with an indication of which normalization to apply it would be useful).

aalfonsi avatar Apr 21 '22 22:04 aalfonsi

I think the overloading functionality from the SupervisedLearning interface is useful, as normalization strategies can also be flexible. If we rely on a dictionary with a boolean (or even a categorical) entry, then this entry has to be checked and interpreted in other places, rather than the method definition handling it.

PaulTalbot-INL avatar Apr 28 '22 20:04 PaulTalbot-INL

Yes. the overloading of the method is for sure useful and should be kept. If info['normalize']=False the normalization should not be called and the overloading of the _localNormalizeData should not be required.
Regarding the normalization "strategies", for most of the ROMs, the standard (z-normalization) scaling (what RAVEN has) should be enough. Sometimes a different normalization/scaling can be useful (for example to filter out outliers or to "remove" the distortion of the scaling if ROM coefficients need to be extracted) The _localNormalizeData has been implemented mostly to have special treatment for time-dependent (or, in the future, high-density) ROMs (and maybe also for this situation standardized approaches can be designed).

aalfonsi avatar Apr 29 '22 03:04 aalfonsi