ray_lightning icon indicating copy to clipboard operation
ray_lightning copied to clipboard

Ray lightning opens a new mlflow run

Open AugustoPeres opened this issue 2 years ago • 0 comments

I have a training script using ray, pytorch lightning and mlflow. When I try to use ray lightning it seems to open another strategy:

First in my script I have the code:

def _log_parameters(**kwargs):
    for key, value in kwargs.items():
        mlflow.log_param(str(key), value)

def main():
    mlflow.start_run()
    _log_parameters(
        dim_model=FLAGS.dim_model,
        learning_rate=FLAGS.learning_rate, some other parameters coming from flags)

I then move on to training with ray:

    ray.init(address='auto')
    plugin = RayStrategy(num_workers=FLAGS.num_workers,
                         num_cpus_per_worker=FLAGS.num_cpus_per_worker,
                         use_gpu=FLAGS.use_gpu)
    trainer = pl.Trainer(max_epochs=FLAGS.max_epochs,
                         strategy=plugin,
                         logger=False,
                         callbacks=all_callbacks,
                         precision=int(FLAGS.precision))
    train.fit(model, training_data_loader, validation_data_loader)

The problem is that, all parameters logged with _log_parameters appear in one run, and all the metrics logged using the callbacks appear in another run.

If I train without ray then everything works as expected. I do not understand why is ray opening another run. Is there a way to prevent this?

AugustoPeres avatar Oct 31 '22 17:10 AugustoPeres