skorch
skorch copied to clipboard
Mutation of init parameters
There was an active discussion in #14 about the drawbacks of passing initialized objects, such as callbacks, to the NeuralNet
constructor. Let's have a focused discussion about this issue here.
From the original discussion:
Thank you for the clarification, I understand your points. I think it is debatable whether this is compatible with the sklearn coding guidelines or not. I'd like to add a few more remarks including my usecases.
Before that, the issue for me is not strictly that the current NeuralNet
does not support uninstanciated callbacks, it is that objects passed as input are mutated. If it were possible to clone the callbacks in initialize_callbacks
before use it would also solve the issues.
think of
callbacks
as an sklearn pipeline
except sklearn.pipeline.Pipeline
(and FeatureUnion
) do not inherit from BaseEstimator
, it's a different beast (a _BaseComposition
). I'd imagine that a NeuralNet
is not a composition, but an estimator (and a meta estimator).
I see two practical issues with instanciated callbacks passed as arguments:
- in some cases, sklearn relies in hashing the input arguments stored as attributes to get a unique index of the estimator, for caching purpose. While I think that the caching utility in sklearn is really rough, current NeuralNet is not compatible with it, because the hash will differs before and after training. It's not a problem with sklearn' metaestimators because instanciated estimators are hashable and always cloned before use.
- it's harder to keep track of objects attached to callbacks when trying to clean things. I've had usecases where the references in
callbacks
and the references incallbacks_
diverged (because I saved and loaded callbacks) and it took me some time to understand why the RAM wouldn't empty (because of references attached incallbacks
).
(On a side not, speaking of references to old objects that should be flushed, it is not good to store a reference to the optimizer in LR schedulers like ReduceLROnPlateau
, because if the optimizer is reinitialized during training, the reference in the scheduler is not updated. For my use I've rewritten those callbacks to avoid such an issue, in general I think that, for all objects that should remain synchronized or that might weight in RAM, it should be carefully ensured that only one reference exist at all time, and this reference is attached to an attribute of the current NeuralNet
instance, and any interaction with those objects are done with this unique reference)
Originally posted by @fcharras in https://github.com/dnouri/skorch/issues/14#issuecomment-460572165
@fcharras
Thank you for the clarification, I understand your points. I think it is debatable whether this is compatible with the sklearn coding guidelines or not. I'd like to add a few more remarks including my usecases.
Let's not have this debate but rather focus on potential problems.
I see two practical issues with instanciated callbacks passed as arguments:
* in some cases, sklearn relies in hashing the input arguments stored as attributes to get a unique index of the estimator, for caching purpose. While I think that the caching utility in sklearn is really rough, current NeuralNet is not compatible with it, because the hash will differs before and after training. It's not a problem with sklearn' metaestimators because instanciated estimators are hashable and always cloned before use.
Fair point, I can see this being a potential problem in the future.
* it's harder to keep track of objects attached to callbacks when trying to clean things. I've had usecases where the references in `callbacks` and the references in `callbacks_` diverged (because I saved and loaded callbacks) and it took me some time to understand why the RAM wouldn't empty (because of references attached in `callbacks`).
Can you provide an example where this is happening? I would assume this is only possible when replacing callbacks using set_params
?
As I see it, a potential workaround would be to clone the callbacks during initialize_callbacks
which would avoid both issues. I can't see any massive drawbacks right now but I haven't thought about it for too long.
I believe the hashing part is not that important. AFAIK, when the pipeline caches, it only caches the transformers and not the final model. Since skorch nets would be the final model, they don't need to be cached. But even if it were a problem, I would guess that almost nobody uses that feature at the moment.
Can you provide an example where this is happening?
Yes, an example would really help here.
As I see it, a potential workaround would be to clone the callbacks during
initialize_callbacks
which would avoid both issues.
If that solves the problem, we could think about it.
Thank you for opening up this thread :+1:
Callbacks have become my favorite way for scripting and automating almost all what I want happening during a fit (initializations, lr management, calibration,...). Before that I would script it inside a meta-estimator but the code got messier for complex scenarii because it would requires nesting several meta-estimators, and then it's harder to reach the NeuralNet
instance. With callbacks, all needed rules are implemented in separates callbacks that can all equally access the NeuralNet
because it's an argument of the callback methods.
You might find this convoluted but I've actually found it to be a satisfying code organisation to describe the training process 100% within a single sklearn estimator object (which is itself embedded into a larger sklearn pipeline including features, cache management,...), which is my goal.
A few of my callbacks have interactions, for instance a callback that would save the state of another callback and restore it later. This is done by pickling the callback then unpickling for restoring, and replacing the current instance in callbacks_
by the pickled one. This causes the references in callbacks
and callbacks_
to diverge, hence one must think to clean both references if annoying objects are attached to it (e.g I had an issue with EpochScoring
cache, that caused pickling to last forever).
If that solves the problem, we could think about it
This would 100% solve the problem and would actually make callback management consistent with sklearn's meta estimators (where the wrapped estimator is instantiated when given as argument, but never touched in fit
, because it's clone beforehand (self.estimator_ = clone(self.estimator)
) and only the clone is fitted.
Another point I wanted to raise (but maybe it's a separate issue) is the danger of storing a reference to the optimizer within some lr scheduler callbacks (namely ReduceLROnPlateau
and CyclicLR
), because if the reference optimizer_
changes in the NeuralNet
instance, the reference within the callback is not updated, which breaks the training (because now you have two instantiated optimizer in memory, and the callback keep updating the wrong one).
Reading the code, it seems that this is does not really fit anyway with how we should use skorch, there are workaround just for those, e.g https://github.com/dnouri/skorch/blob/master/skorch/callbacks/lr_scheduler.py#L155 .
Surely it's how it's done in pytorch
but I'd argue that their implementation is shaky, and that those callbacks are conceptually simple enough and easy to maintain to be re-implemented 100% within skorch to fit better skorch API (I'd be happy to provide mine :smile: ).
Thank you for taking the time to explain your situation.
First about cloning callbacks: I currently see no issue with that. The callbacks
attribute would then still contain the original callbacks provided by the user, while the callbacks_
attribute would contain the cloned callbacks. We would need to make sure to cover the 3 possible types that can be passed:
- initialized callbacks
- uninitialized callbacks
-
None
@fcharras are you willing to take on that task?
Regarding the issue with callbacks holding references to the optimizer: You are right that this can be problematic. In addition to what you said, stray references can cause trouble when saving and loading models on machines with or without CUDA availability.
As to the solution: On the one hand, using pytorch's classes pushes us towards referencing the optimizer. On the other hand, if we re-implement the functionality, we always need to make sure to stay up-to-date with pytorch. Our philosophy for skorch is to re-use as much as possible to keep friction low.
Ideally, we find a solution that still re-uses most of what pytorch has to offer without needing to reference the optimizer. I agree that parts of the learning rate schedulers need re-writing, as they are un-sklearn-y. If you have a suggestion, you can use the new github PR feature to propose it.
I started to work on the callback task today, it breaks a certain number of tests (some because of the mock system, and by design sklearn breaks when it tries to clone an object with uninitialized objects that have a get_params
method...). Fixing the tests doesn't seem too uncertain but I kind of lack of time for open source investment lately, I'll keep you updated.