ignite
ignite copied to clipboard
Port Ray example to Ignite
- https://docs.ray.io/en/master/tune/examples/tune-pytorch-cifar.html
Hi! I do not have much experience in distributed algorithms but I really like them and am learning them. I think it'll be really great if I could work on this as it'll provide some great exposure to ML and Distributed workflows(both of which I really like :D). However, I am not sure I'll be able to work on it at a very fast pace, so if it's urgent(or not doable by beginners) then someone else can please take it up, if not I'd love to work on it :)
@vfdev-5 Your idea is to use ray.tune as in the doc you mentioned ? I mean the experiment tool ?
@Devanshu24 if so, the baseline of this should be our cifar distributed training use case. Please see https://github.com/pytorch/ignite/tree/master/examples/contrib/cifar10
If you are motivated to learn about distributed training, why not have a look to the link above ? Before going further, it would be important to be comfortable with this. What do you think ?
Thanks for the reply @sdesrozis !
To confirm if I am getting it correctly, we want to use ray.tune
and other distributed utilities provided by ray and see how it performs in comparison to the cifar example already in ignite(https://github.com/pytorch/ignite/tree/master/examples/contrib/cifar10)
Correct?
If so, then sure I completely agree I'll start with going through the ignite example and hopefully make some headway and start with the ray implementation! :D
The idea is to port this example : https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/cifar10_pytorch.py :
- use Ignite for training and validation
- use ray tune for hyperparam tuning
as a simple script file to examples/contrib/cifar10_ray_tune
Great addition will be a PR to ray docs with the example.
Can't we not to the same way ray is implemented in PL? Creating callbacks?
Can't we not to the same way ray is implemented in PL? Creating callbacks?
Please, detail your idea ?
https://docs.ray.io/en/master/tune/tutorials/tune-pytorch-lightning.html#training-with-gpus Similar to the above An abstract implementation
from ray.tune.integration.pytorch_ignite import TuneReportCallback
def run(train_batch_size, val_batch_size, epochs, lr, momentum, log_dir):
train_loader, val_loader = get_data_loaders(train_batch_size, val_batch_size)
model = Net()
-----------------------#callback to ray.TuneReportCallback----------------
trainer = pi.Trainer(
....,
callbacks=[
TuneReportCallback(
{
"loss": "ptl/val_loss",
"mean_accuracy": "ptl/val_accuracy"
},
on="validation_end")
])
device = "cpu"
if torch.cuda.is_available():
device = "cuda"
model.to(device) # Move model before creating optimizer
optimizer = SGD(model.parameters(), lr=lr, momentum=momentum)
criterion = nn.CrossEntropyLoss()
trainer = create_supervised_trainer(model, optimizer, criterion, device=device)
trainer.logger = setup_logger("Trainer")
if sys.version_info > (3,):
from ignite.contrib.metrics.gpu_info import GpuInfo
try:
GpuInfo().attach(trainer)
except RuntimeError:
print(
"INFO: By default, in this example it is possible to log GPU information (used memory, utilization). "
"As there is no pynvml python package installed, GPU information won't be logged. Otherwise, please "
"install it : `pip install pynvml`"
)
metrics = {"accuracy": Accuracy(), "loss": Loss(criterion)}
train_evaluator = create_supervised_evaluator(model, metrics=metrics, device=device)
train_evaluator.logger = setup_logger("Train Evaluator")
validation_evaluator = create_supervised_evaluator(model, metrics=metrics, device=device)
validation_evaluator.logger = setup_logger("Val Evaluator")
def tune_mnist_asha(num_samples=10, num_epochs=10, gpus_per_trial=0):
data_dir = os.path.join(tempfile.gettempdir(), "mnist_data_")
config = {
"layer_1_size": tune.choice([32, 64, 128]),
"layer_2_size": tune.choice([64, 128, 256]),
"lr": tune.loguniform(1e-4, 1e-1),
"batch_size": tune.choice([32, 64, 128]),
}
scheduler = ASHAScheduler(
max_t=num_epochs,
grace_period=1,
reduction_factor=2)
reporter = CLIReporter(
parameter_columns=["layer_1_size", "layer_2_size", "lr", "batch_size"],
metric_columns=["loss", "mean_accuracy", "training_iteration"])
analysis = tune.run(
tune.with_parameters(
train_mnist_tune,
data_dir=data_dir,
num_epochs=num_epochs,
num_gpus=gpus_per_trial),
resources_per_trial={
"cpu": 1,
"gpu": gpus_per_trial
},
metric="loss",
mode="min",
config=config,
num_samples=num_samples,
scheduler=scheduler,
progress_reporter=reporter,
name="tune_mnist_asha")
print("Best hyperparameters found were: ", analysis.best_config)
shutil.rmtree(data_dir)
Since most of the heavy-lifting is done by ray we could extrapolate by adding a pytorch_ignite
in the ray.tune.intergration
namespace module and implementing ignite's particular way of calling is what I was thinking!
@Rajathbharadwaj thanks for the details. Yes, this would be great!
Awesome, I'll work on the integration Any tips would be awesome!
https://github.com/ray-project/ray/blob/master/python/ray/tune/integration/pytorch_lightning.py
Converting to pytorch Ignite's way of implementing.
@Rajathbharadwaj any updates on this porting ?
Hey @vfdev-5, I got a bit held up. But I'm working on it. Will ping you.
@Rajathbharadwaj still working on this issue ?
Hey @vfdev-5 , if no one else if working on this, can I pick this up?
Hey @vfdev-5 , if no one else if working on this, can I pick this up?
Sure, go ahead. Thanks!