skorch icon indicating copy to clipboard operation
skorch copied to clipboard

Checkpoint dirname argument can also be a callable

Open BenjaminBossan opened this issue 2 years ago • 0 comments

Solves #848

Description

For Checkpoint, as of now, dirname can only be a string. With this update, it can also be a callable with no arguments that returns a string.

What this solves is that the directory that a model is saved in can now contain a dynamic element. This way, if you run, e.g., grid search with n_jobs>1 + checkpoint, each checkpoint instance can have its own directory name (e.g. using a function that returns a random name), while the files inside the directory still follow the same naming.

Without such a possibility, if a user runs grid search with n_jobs>1 and checkpoint with load_best=True, the loaded model would always be whatever happens to be the latest one stored, which can result in (silent) errors.

If anyone has a better idea how to solve the underlying problem, I'm open to it.

Implementation

As a consequence of the dirname now not being known at __init__ time, I removed the validation of the filenames from there. We still validate them inside initialize, which is sufficient in my opinion.

In theory, we could call the dirname function inside __init__ to validate it, and then call it again inside initialize to actually set it, but I don't like that. The reason is that we would call a function that is possible non-deterministic or might have side effects twice, with unknown consequences. This should be avoided if possible.

Example

Before, code like this would fail:

params = {
    'module__num_units': [10, 20, 30],
}

net = NeuralNetClassifier(
    ClassifierModule,
    callbacks=[Checkpoint(f_history=None, load_best=True)]
)
gs = GridSearchCV(net, params, refit=False, cv=3, scoring='accuracy', n_jobs=3)
gs.fit(X, y)

# error message:
# RuntimeError: Error(s) in loading state_dict for ClassifierModule:
#	size mismatch for dense0.weight: copying a param with shape torch.Size([20, 20]) from checkpoint, the shape in current model is torch.Size([10, 20]).

With this feature, we can do:

import random
def make_dirname():
    return f"test-dir-{random.randint(0, 1_000_000)}"

net = NeuralNetClassifier(
    ClassifierModule,
    callbacks=[Checkpoint(f_history=None, load_best=True, dirname=make_dirname)]
)
gs = GridSearchCV(net, params, refit=False, cv=3, scoring='accuracy', n_jobs=3)
gs.fit(X, y)

and it works.

BenjaminBossan avatar Jul 12 '22 12:07 BenjaminBossan