mango icon indicating copy to clipboard operation
mango copied to clipboard

more iteration than setting

Open wt12318 opened this issue 2 years ago • 6 comments

Hi,

When I set the num_iteration is 50, the actual running iteration is more than 50:

config = dict()
config["optimizer"] = "Bayesian"
config["num_iteration"] = 50

tuner = Tuner(HYPERPARAMETERS, 
              objective=run_one_training,
              conf_dict=config) 
results = tuner.minimize()

The MLflow shows it has run 62 iterations: image

wt12318 avatar May 30 '22 01:05 wt12318

Hi, Thanks for asking this question.

Internally, Mango will run a few random iterations to do a proper initialization The number of these random iterations by default is 2. You can modify this by the config parameter 'initial_random': 2 So, in most cases, your total iterations will be num_iteration + initial_random

However, this random parameter is a suggestion to the optimizer, and in some cases, it may run more random iterations to do proper initialization. This happens for problems where the variation in the objective value is very little, and Mango may internally decide to more random iterations to make sure it finds good regions in the hyperparameter space. For most of the problems setting initial_random will make the iterations bounded as needed.

This may also happen in cases when some of the random iterations didn't succeed, and your objective function was able to consider their failures, due to which Mango ran more random iterations to make sure 2 random iterations succeeded.

sandeep-iitr avatar May 30 '22 01:05 sandeep-iitr

Thank you

wt12318 avatar May 30 '22 02:05 wt12318

Hi,

When I set the initial_random is one, but it still run more iterations than I set. And the total number combination of my all parameter is 36, but it run more iterations than 36. Why this happened?

Thank you.

wt12318 avatar Jun 09 '22 10:06 wt12318

Can you share more details about your parameter space and the definition of your objective function?

sandeep-iitr avatar Jun 09 '22 16:06 sandeep-iitr

Thank you for reply. This is my objective function and parameter space:

@scheduler.parallel(n_jobs=36)
def run_one_training(**params):
    with mlflow.start_run() as run:
        # Log parameters used in this experiment
        for key in params.keys():
            mlflow.log_param(key, params[key])

        # Loading the dataset
        print("Loading dataset...")
        train_dataset = TCRpMHCDataset(root="/public/slst/home/wutao2/TCR_neo/data/", filename="train_dt.csv",aaindex=aaindex, test=False, val=False)
        test_dataset = TCRpMHCDataset(root="/public/slst/home/wutao2/TCR_neo/data/", filename="val_dt.csv", aaindex=aaindex, test=False, val=True)

        # Prepare training
        train_loader = DataLoader(train_dataset, batch_size=params["batch_size"], shuffle=True)
        test_loader = DataLoader(test_dataset, batch_size=params["batch_size"], shuffle=True)

        # Loading the model
        print("Loading model...")
        model_params = {k: v for k, v in params.items() if k.startswith("model_")}
        model = GNN(feature_size=train_dataset[0].x.shape[1], model_params=model_params) 
        model = model.to(device)
        print(f"Number of parameters: {count_parameters(model)}")
        mlflow.log_param("num_params", count_parameters(model))

        # < 1 increases precision, > 1 recall
        loss_fn = torch.nn.BCEWithLogitsLoss()##
        optimizer = torch.optim.Adam(model.parameters(), 
                                    lr=params["learning_rate"],
                                    weight_decay=0)
        #scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=params["scheduler_gamma"])
        
        # Start training
        best_loss = 1000
        early_stopping_counter = 0
        for epoch in range(20): 
            if early_stopping_counter <= 5: # = x * 5 
                # Training
                model.train()
                loss = train_one_epoch(epoch, model, train_loader, optimizer, loss_fn)
                print(f"Epoch {epoch} | Train Loss {loss}")
                mlflow.log_metric(key="Train loss", value=float(loss), step=epoch)

                # Testing
                model.eval()
                if epoch % 1 == 0:
                    loss = test(epoch, model, test_loader, loss_fn)
                    print(f"Epoch {epoch} | Test Loss {loss}")
                    mlflow.log_metric(key="Test loss", value=float(loss), step=epoch)
                    
                    # Update best loss
                    if float(loss) < best_loss:
                        best_loss = loss
                        # Save the currently best model 
                        mlflow.pytorch.log_model(model, "model", signature=SIGNATURE)
                        
                        early_stopping_counter = 0
                    else:
                        early_stopping_counter += 1

            else:
                print("Early stopping due to no improvement.")
                return [best_loss]
    print(f"Finishing training with best test loss: {best_loss}")
    return [best_loss]

HYPERPARAMETERS = {
    "batch_size": [32,64,128],
    "learning_rate": [0.001,0.0001],
    "model_embedding_size": [32,64,128],
    "model_layers": [2,3],
    "model_dropout_rate": [0.5]
}

torch.set_num_threads(36)
torch.manual_seed(2022060801)
print("Running hyperparameter search...")
config = dict()
config["optimizer"] = "Bayesian"
config["num_iteration"] = 36
config["initial_random"] = 1

tuner = Tuner(HYPERPARAMETERS, 
              run_one_training,
              config) 
results = tuner.minimize()

image

wt12318 avatar Jun 10 '22 00:06 wt12318

Hi, Thanks for providing the details. I am a little busy due to an immediate deadline for the last few days. I will work on reproducing this issue next week and will update you with a solution or more information.

sandeep-iitr avatar Jun 17 '22 19:06 sandeep-iitr