test-tube icon indicating copy to clipboard operation
test-tube copied to clipboard

SlurmCluster without hyperparameters

Open AlexSchuy opened this issue 3 years ago • 1 comments

I'm attempting to train a PytorchLightning model on a slurm cluster, and the PytorchLightning documentation recommends use of the SlurmCluster class in this package to automate submission of slurm scripts. The examples all involve running a hyperparameter scan, however I would like to train just a single model. My attempt at doing so is as follows:

cluster = SlurmCluster()
[...] (set cluster.per_experiment_nb_cpus, cluster.job_time, etc.)
cluster.optimize_parallel_cluster_gpu(train, nb_trials=1, ...)

However, this fails with:

Traceback (most recent call last):
  File "train.py", line 67, in hydra_main
    train, nb_trials=1, job_name='pl-slurm', job_display_name='pl-slurm')
  File "/global/u2/s/schuya/.local/cori/pytorchv1.5.0-gpu/lib/python3.7/site-packages/test_tube/hpc.py", line 127, in optimize_parallel_cluster_gpu
    enable_auto_resubmit, on_gpu=True)
  File "/global/u2/s/schuya/.local/cori/pytorchv1.5.0-gpu/lib/python3.7/site-packages/test_tube/hpc.py", line 167, in __optimize_parallel_cluster_internal
    if self.is_from_slurm_object:
AttributeError: 'SlurmCluster' object has no attribute 'is_from_slurm_object'

Looking at the code, it seems that SlurmCluster.is_from_slurm_object was never set. This is because I did not pass in a hyperparam_optimizer, as I did not intend to perform a scan. What is the correct way go about this?

AlexSchuy avatar Aug 20 '20 20:08 AlexSchuy

I know this is pretty old but I just stumbled across your question. I'd say you simply pass a HyperparameterOptimizer which doesn't have any options for optimization, I guess that would be the simplest solution. But I guess that should be noted in the docs somewhere that test tube needs an optimizer to be set.

florianblume avatar Feb 03 '22 06:02 florianblume