ignite icon indicating copy to clipboard operation
ignite copied to clipboard

Port Ray example to Ignite

Open vfdev-5 opened this issue 3 years ago • 14 comments

  • https://docs.ray.io/en/master/tune/examples/tune-pytorch-cifar.html

vfdev-5 avatar Mar 03 '21 14:03 vfdev-5

Hi! I do not have much experience in distributed algorithms but I really like them and am learning them. I think it'll be really great if I could work on this as it'll provide some great exposure to ML and Distributed workflows(both of which I really like :D). However, I am not sure I'll be able to work on it at a very fast pace, so if it's urgent(or not doable by beginners) then someone else can please take it up, if not I'd love to work on it :)

Devanshu24 avatar Mar 06 '21 15:03 Devanshu24

@vfdev-5 Your idea is to use ray.tune as in the doc you mentioned ? I mean the experiment tool ?

@Devanshu24 if so, the baseline of this should be our cifar distributed training use case. Please see https://github.com/pytorch/ignite/tree/master/examples/contrib/cifar10

If you are motivated to learn about distributed training, why not have a look to the link above ? Before going further, it would be important to be comfortable with this. What do you think ?

sdesrozis avatar Mar 06 '21 15:03 sdesrozis

Thanks for the reply @sdesrozis ! To confirm if I am getting it correctly, we want to use ray.tune and other distributed utilities provided by ray and see how it performs in comparison to the cifar example already in ignite(https://github.com/pytorch/ignite/tree/master/examples/contrib/cifar10) Correct? If so, then sure I completely agree I'll start with going through the ignite example and hopefully make some headway and start with the ray implementation! :D

Devanshu24 avatar Mar 06 '21 16:03 Devanshu24

The idea is to port this example : https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/cifar10_pytorch.py :

  • use Ignite for training and validation
  • use ray tune for hyperparam tuning

as a simple script file to examples/contrib/cifar10_ray_tune

Great addition will be a PR to ray docs with the example.

vfdev-5 avatar Mar 07 '21 16:03 vfdev-5

Can't we not to the same way ray is implemented in PL? Creating callbacks?

Rajathbharadwaj avatar Mar 08 '21 11:03 Rajathbharadwaj

Can't we not to the same way ray is implemented in PL? Creating callbacks?

Please, detail your idea ?

vfdev-5 avatar Mar 08 '21 11:03 vfdev-5

https://docs.ray.io/en/master/tune/tutorials/tune-pytorch-lightning.html#training-with-gpus Similar to the above An abstract implementation

from ray.tune.integration.pytorch_ignite import TuneReportCallback

def run(train_batch_size, val_batch_size, epochs, lr, momentum, log_dir):
    train_loader, val_loader = get_data_loaders(train_batch_size, val_batch_size)
    model = Net()
    -----------------------#callback to ray.TuneReportCallback----------------
    trainer = pi.Trainer(
        ....,

        callbacks=[
            TuneReportCallback(
                {
                    "loss": "ptl/val_loss",
                    "mean_accuracy": "ptl/val_accuracy"
                },
                on="validation_end")
        ])
    device = "cpu"

    if torch.cuda.is_available():
        device = "cuda"

    model.to(device)  # Move model before creating optimizer
    optimizer = SGD(model.parameters(), lr=lr, momentum=momentum)
    criterion = nn.CrossEntropyLoss()
    trainer = create_supervised_trainer(model, optimizer, criterion, device=device)
    trainer.logger = setup_logger("Trainer")

    if sys.version_info > (3,):
        from ignite.contrib.metrics.gpu_info import GpuInfo

        try:
            GpuInfo().attach(trainer)
        except RuntimeError:
            print(
                "INFO: By default, in this example it is possible to log GPU information (used memory, utilization). "
                "As there is no pynvml python package installed, GPU information won't be logged. Otherwise, please "
                "install it : `pip install pynvml`"
            )

    metrics = {"accuracy": Accuracy(), "loss": Loss(criterion)}

    train_evaluator = create_supervised_evaluator(model, metrics=metrics, device=device)
    train_evaluator.logger = setup_logger("Train Evaluator")
    validation_evaluator = create_supervised_evaluator(model, metrics=metrics, device=device)
    validation_evaluator.logger = setup_logger("Val Evaluator")
def tune_mnist_asha(num_samples=10, num_epochs=10, gpus_per_trial=0):
    data_dir = os.path.join(tempfile.gettempdir(), "mnist_data_")
   

    config = {
        "layer_1_size": tune.choice([32, 64, 128]),
        "layer_2_size": tune.choice([64, 128, 256]),
        "lr": tune.loguniform(1e-4, 1e-1),
        "batch_size": tune.choice([32, 64, 128]),
    }

    scheduler = ASHAScheduler(
        max_t=num_epochs,
        grace_period=1,
        reduction_factor=2)

    reporter = CLIReporter(
        parameter_columns=["layer_1_size", "layer_2_size", "lr", "batch_size"],
        metric_columns=["loss", "mean_accuracy", "training_iteration"])

    analysis = tune.run(
        tune.with_parameters(
            train_mnist_tune,
            data_dir=data_dir,
            num_epochs=num_epochs,
            num_gpus=gpus_per_trial),
        resources_per_trial={
            "cpu": 1,
            "gpu": gpus_per_trial
        },
        metric="loss",
        mode="min",
        config=config,
        num_samples=num_samples,
        scheduler=scheduler,
        progress_reporter=reporter,
        name="tune_mnist_asha")

    print("Best hyperparameters found were: ", analysis.best_config)

    shutil.rmtree(data_dir)

Since most of the heavy-lifting is done by ray we could extrapolate by adding a pytorch_ignite in the ray.tune.intergration namespace module and implementing ignite's particular way of calling is what I was thinking!

Rajathbharadwaj avatar Mar 08 '21 12:03 Rajathbharadwaj

@Rajathbharadwaj thanks for the details. Yes, this would be great!

vfdev-5 avatar Mar 08 '21 13:03 vfdev-5

Awesome, I'll work on the integration Any tips would be awesome!

https://github.com/ray-project/ray/blob/master/python/ray/tune/integration/pytorch_lightning.py

Converting to pytorch Ignite's way of implementing.

Rajathbharadwaj avatar Mar 08 '21 13:03 Rajathbharadwaj

@Rajathbharadwaj any updates on this porting ?

vfdev-5 avatar Mar 22 '21 13:03 vfdev-5

Hey @vfdev-5, I got a bit held up. But I'm working on it. Will ping you.

Rajathbharadwaj avatar Mar 24 '21 02:03 Rajathbharadwaj

@Rajathbharadwaj still working on this issue ?

vfdev-5 avatar May 15 '21 14:05 vfdev-5

Hey @vfdev-5 , if no one else if working on this, can I pick this up?

gucifer avatar Feb 12 '22 07:02 gucifer

Hey @vfdev-5 , if no one else if working on this, can I pick this up?

Sure, go ahead. Thanks!

vfdev-5 avatar Feb 12 '22 09:02 vfdev-5