mxnet icon indicating copy to clipboard operation
mxnet copied to clipboard

add ignore_reinit to initialize to skip warnings

Open zhreshold opened this issue 7 years ago • 13 comments

Description

When net is partially initialized(which is very common), this PR allow

net.collect_params().initialize(ignore_reinit=True) 

without throwing warnings for each initialized parameter. Otherwise user have to use warnings.catch_warnings if they already understand the situation.

This PR has no effect on any functionality, simply don't generate warnings if set to True.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • [x] Changes are complete (i.e. I finished coding on this PR)
  • [x] All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • [x] Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • [ ] Feature1, tests, (and when applicable, API doc)
  • [ ] Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

zhreshold avatar Jun 27 '18 21:06 zhreshold

@piiswrong @szha

zhreshold avatar Jun 28 '18 17:06 zhreshold

Yes, I agree that it’s not a good practice to add an option to disable warning.

However in this particular case, if user acknowledged that they understand the risk of already initialized parameters, there is no need to generate warnings any more, especially for a deep network it will propagate hundreds of warnings, suddenly block the entire screen

zhreshold avatar Jun 29 '18 20:06 zhreshold

Please reconsider adding this feature.

I would argue that .initialize() is an idempotent operation like .hybridize(). In my opinion, it's very annoying, that attempting to initialize an already (maybe partially) initialized model prints countless warnings without any ability to silence them.

I am not suggesting, that it shouldn't print warnings at all, but at least having an option to silence them should be useful. If you need a precedent for a similar functionality, there is the allow_missing and ignore_extra parameters of mxnet.gluon.Block.load_parameters, although they ignore errors, not warnings. Also, as I mentioned, attempting to hybridize an already hybridized model doesn't print any warnings.

ruro avatar Feb 04 '20 14:02 ruro

@zhreshold what's your current take on this issue?

leezu avatar Feb 04 '20 18:02 leezu

I need more opinions, my personal feeling is that is much convenient to use the switch to turn off the warnings

zhreshold avatar Feb 04 '20 18:02 zhreshold

To be honest, I don't quite understand your concern. It would be totally understandable, if this warning was signaling about something serious or if I was proposing to make it silent by default.

As I see it, a warning means that there is something in my code, that requires my attention. Either something is wrong in my code, or this warning was a false positive and my code does exactly what I want it to do. As soon, as I see this warning, I should go and either fix my code or explicitly specify, that I know, what I am doing.

Currently, there is no (elegant) way to silence the warning. So what is the "intended" way to fix this warning?

If somebody calls model.initialize() on a partially initialized/loaded model, what is the potential unexpected behaviour, that is being reported by the warnings? How should they change their code, so that the warning goes away?

ruro avatar Feb 04 '20 20:02 ruro

FYI, currently to avoid the warning one should only collect the parameters that are known to be uninitialized. Either by passing a regex to collect_params(), or using model.subblockthatisnotinitiialized.initialize(). That may not be always easily feasible, so it's worth to consider adding the ignore_reinit option.

The current workaround is more safe, because it requires you to be explicit about what is uninitialized.

leezu avatar Feb 04 '20 21:02 leezu

I would say it might be extra burden for users who don't know exactly what layers are inited or uninited and it may not be feasible to control all of them in case there are 1000 layers in the network while 500 are not initialized

zhreshold avatar Feb 04 '20 21:02 zhreshold

@leezu, oh I didn't know, that collect_params accepted a regex. Nice.

Yeah, in my case, I dynamically assemble models from bits and pieces, based on a huge config file, with some pretrained gluon/gluoncv model zoo fragments, some newly created Blocks and possibly some blocks from our custom, previously pretrained models.

I guess, I could build some system around it, to keep track, which blocks need initialization or immediately initialize new blocks, when creating them, but just doing model.initialize(ctx=ctx) at the end is so much simpler and does exactly what I want it to do.

Also, some models in gluoncv.model_zoo are returned partially initialized in different ways, depending on which **kwargs you passed to get_model and I really don't want to "support" all the various initialization configurations.

ruro avatar Feb 04 '20 22:02 ruro

@RuRo @zhreshold Yes, there may be use-cases where the regex is not sufficient.

I just wonder how common they are and if https://docs.python.org/3/library/warnings.html#temporarily-suppressing-warnings could sufficiently address them.

In either way, we can merge this PR if you think the use-cases are sufficiently important to introduce a new argument. I wouldn't mind. Just wanted to point out the downsides and the current workaround.

leezu avatar Feb 04 '20 23:02 leezu

I'm open to introducing this if it improves experience. However, name like ignore_reinit doesn't seem obvious to me that it suppresses warnings. I'd be more glad to recommend accepting this if we could come up with a more intuitive name.

szha avatar May 24 '20 05:05 szha

suppress_warnings would be generic? ignore_reinitialization would be too specific. Not sure what is the right middle. But I'm interested to see if this enhances user experience and if it's a customer ask.

ChaiBapchya avatar May 24 '20 07:05 ChaiBapchya

However, name like ignore_reinit doesn't seem obvious to me that it suppresses warnings. I'd be more glad to recommend accepting this if we could come up with a more intuitive name.

I think, the option was called ignore_reinit for symmetry with the existing force_reinit option. Perhaps something like skip_reinit can better convey, what the proposed option does?

The reasoning would be that by default it is ambiguous, what to do with already initialized params (hence the warning). Specifying either force_reinit or skip_reinit would then explicitly specify, what behavior the user wants and so the warning would no longer be required.

ruro avatar May 24 '20 09:05 ruro