ray_lightning icon indicating copy to clipboard operation
ray_lightning copied to clipboard

Can not checkpoint and log

Open lcaquot94 opened this issue 2 years ago • 1 comments

The documentations says that when using with Ray Client, you must disable checkpointing and logging for your Trainer by setting checkpoint_callback and logger to False. So how can we log and save model during training ?

lcaquot94 avatar Nov 18 '22 13:11 lcaquot94

I have been doing this:

  1. import TuneReportCheckpointCallback from ray_lightning
from ray_lightning.tune import TuneReportCheckpointCallback
  1. Disable checkpointing with "enable_checkpointing": False, in the pl Trainer's configuration
  2. Initialize logger:
    tb_logger = pl_loggers.TensorBoardLogger(save_dir="/tmp/some-dir")
  1. Initialize tuning strategy
    from ray_lightning import RayStrategy
    strategy = RayStrategy(num_workers=1, num_cpus_per_worker=1, use_gpu=True)
  1. Initialize trainer:
    trainer = pl.Trainer(
        **trainer_config,
        callbacks=[TuneReportCheckpointCallback({"accuracy": "accuracy"}, on="epoch_end")],
        strategy=strategy,
        logger=tb_logger
    )

bparaj avatar Dec 10 '22 03:12 bparaj