mxnet
mxnet copied to clipboard
add ignore_reinit to initialize to skip warnings
Description
When net is partially initialized(which is very common), this PR allow
net.collect_params().initialize(ignore_reinit=True)
without throwing warnings for each initialized parameter. Otherwise user have to use warnings.catch_warnings if they already understand the situation.
This PR has no effect on any functionality, simply don't generate warnings if set to True.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
- [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
- [x] Changes are complete (i.e. I finished coding on this PR)
- [x] All changes have test coverage:
- Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
- Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
- Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
- [x] Code is well-documented:
- For user-facing API changes, API doc string has been updated.
- For new C++ functions in header files, their functionalities and arguments are documented.
- For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
- Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
- [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Changes
- [ ] Feature1, tests, (and when applicable, API doc)
- [ ] Feature2, tests, (and when applicable, API doc)
Comments
- If this change is a backward incompatible change, why must this change be made.
- Interesting edge cases to note here
@piiswrong @szha
Yes, I agree that it’s not a good practice to add an option to disable warning.
However in this particular case, if user acknowledged that they understand the risk of already initialized parameters, there is no need to generate warnings any more, especially for a deep network it will propagate hundreds of warnings, suddenly block the entire screen
Please reconsider adding this feature.
I would argue that .initialize() is an idempotent operation like .hybridize(). In my opinion, it's very annoying, that attempting to initialize an already (maybe partially) initialized model prints countless warnings without any ability to silence them.
I am not suggesting, that it shouldn't print warnings at all, but at least having an option to silence them should be useful. If you need a precedent for a similar functionality, there is the allow_missing and ignore_extra parameters of mxnet.gluon.Block.load_parameters, although they ignore errors, not warnings. Also, as I mentioned, attempting to hybridize an already hybridized model doesn't print any warnings.
@zhreshold what's your current take on this issue?
I need more opinions, my personal feeling is that is much convenient to use the switch to turn off the warnings
To be honest, I don't quite understand your concern. It would be totally understandable, if this warning was signaling about something serious or if I was proposing to make it silent by default.
As I see it, a warning means that there is something in my code, that requires my attention. Either something is wrong in my code, or this warning was a false positive and my code does exactly what I want it to do. As soon, as I see this warning, I should go and either fix my code or explicitly specify, that I know, what I am doing.
Currently, there is no (elegant) way to silence the warning. So what is the "intended" way to fix this warning?
If somebody calls model.initialize() on a partially initialized/loaded model, what is the potential unexpected behaviour, that is being reported by the warnings? How should they change their code, so that the warning goes away?
FYI, currently to avoid the warning one should only collect the parameters that are known to be uninitialized. Either by passing a regex to collect_params(), or using model.subblockthatisnotinitiialized.initialize().
That may not be always easily feasible, so it's worth to consider adding the ignore_reinit option.
The current workaround is more safe, because it requires you to be explicit about what is uninitialized.
I would say it might be extra burden for users who don't know exactly what layers are inited or uninited and it may not be feasible to control all of them in case there are 1000 layers in the network while 500 are not initialized
@leezu, oh I didn't know, that collect_params accepted a regex. Nice.
Yeah, in my case, I dynamically assemble models from bits and pieces, based on a huge config file, with some pretrained gluon/gluoncv model zoo fragments, some newly created Blocks and possibly some blocks from our custom, previously pretrained models.
I guess, I could build some system around it, to keep track, which blocks need initialization or immediately initialize new blocks, when creating them, but just doing model.initialize(ctx=ctx) at the end is so much simpler and does exactly what I want it to do.
Also, some models in gluoncv.model_zoo are returned partially initialized in different ways, depending on which **kwargs you passed to get_model and I really don't want to "support" all the various initialization configurations.
@RuRo @zhreshold Yes, there may be use-cases where the regex is not sufficient.
I just wonder how common they are and if https://docs.python.org/3/library/warnings.html#temporarily-suppressing-warnings could sufficiently address them.
In either way, we can merge this PR if you think the use-cases are sufficiently important to introduce a new argument. I wouldn't mind. Just wanted to point out the downsides and the current workaround.
I'm open to introducing this if it improves experience. However, name like ignore_reinit doesn't seem obvious to me that it suppresses warnings. I'd be more glad to recommend accepting this if we could come up with a more intuitive name.
suppress_warnings would be generic? ignore_reinitialization would be too specific. Not sure what is the right middle. But I'm interested to see if this enhances user experience and if it's a customer ask.
However, name like
ignore_reinitdoesn't seem obvious to me that it suppresses warnings. I'd be more glad to recommend accepting this if we could come up with a more intuitive name.
I think, the option was called ignore_reinit for symmetry with the existing force_reinit option. Perhaps something like skip_reinit can better convey, what the proposed option does?
The reasoning would be that by default it is ambiguous, what to do with already initialized params (hence the warning). Specifying either force_reinit or skip_reinit would then explicitly specify, what behavior the user wants and so the warning would no longer be required.