raven
raven copied to clipboard
[DEFECT] ROM "info" is not used by most of the ROMs
Thank you for the defect report
- [X] I am using the latest version of
RAVEN. - [X] I have read the Wiki.
- [X] I have created a minimum, reproducible example that demonstrates the defect.
Defect Description
the ROM info that was supposed to trigger the "normalization" or not of the training data in the supervised learning is not used generically. Only the skLearn roms use it in the SkLearn base class.
Steps to Reproduce
N/A
Expected Behavior
if I set info['normalize'] = True, the training data get normalized.
Screenshots and Input Files
No response
OS
MacOS
OS Version
No response
Dependency Manager
CONDA
For Change Control Board: Issue Review
- [ ] Is it tagged with a type: defect or task?
- [ ] Is it tagged with a priority: critical, normal or minor?
- [ ] If it will impact requirements or requirements tests, is it tagged with requirements?
- [ ] If it is a defect, can it cause wrong results for users? If so an email needs to be sent to the users.
- [ ] Is a rationale provided? (Such as explaining why the improvement is needed or why current code is wrong.)
For Change Control Board: Issue Closure
- [ ] If the issue is a defect, is the defect fixed?
- [ ] If the issue is a defect, is the defect tested for in the regression test system? (If not explain why not.)
- [ ] If the issue can impact users, has an email to the users group been written (the email should specify if the defect impacts stable or master)?
- [ ] If the issue is a defect, does it impact the latest release branch? If yes, is there any issue tagged with release (create if needed)?
- [ ] If the issue is being closed without a pull request, has an explanation of why it is being closed been provided?
@aalfonsi FYI, I have checked the source code, the default behavior for supervised learning algorithms is to normalize the feature space. In the scikit-learn based rom, we override the normalization function to use the info dict to determine whether the rom need to be normalized or not. As you said, the info dict is not yet used by non-sklearn rom. The following are my suggestion, we can extend our rom input to allow user to enable or disable the normalization.
Yes. I agree with your idea. For the normalization, the _localNormalizeData is always called, but the only why to "bypass" it is to overload it (instead the info dictionary should be used). Probably also its scope (with an indication of which normalization to apply it would be useful).
I think the overloading functionality from the SupervisedLearning interface is useful, as normalization strategies can also be flexible. If we rely on a dictionary with a boolean (or even a categorical) entry, then this entry has to be checked and interpreted in other places, rather than the method definition handling it.
Yes. the overloading of the method is for sure useful and should be kept. If info['normalize']=False the normalization should not be called and the overloading of the _localNormalizeData should not be required.
Regarding the normalization "strategies", for most of the ROMs, the standard (z-normalization) scaling (what RAVEN has) should be enough. Sometimes a different normalization/scaling can be useful (for example to filter out outliers or to "remove" the distortion of the scaling if ROM coefficients need to be extracted)
The _localNormalizeData has been implemented mostly to have special treatment for time-dependent (or, in the future, high-density) ROMs (and maybe also for this situation standardized approaches can be designed).