super-gradients icon indicating copy to clipboard operation
super-gradients copied to clipboard

Add more labels to a custom trained model

Open rvryan67 opened this issue 1 year ago • 8 comments

💡 Your Question

I have a custom trained model model . It was trained using 13,000 images and labels and took 12 hours to train.

I want to add more training data (images and labels)

Is there a way to incrementally add a small amount of additional training data and re-run training without it taking 12 hours to complete?

Versions

No response

rvryan67 avatar Jan 08 '24 12:01 rvryan67

You can always load a weights of the trained model from previous step and continue training it from that state.

model = models.get(...., checkpoint_path=<ABSOLUTE_PATH_TO_CHECKPOINT_FROM_PREVIOUS_TRAINING>)

Or via cmd-line if you are using YAML recipes: python -m super_gradients.train_from_recipe --config-name=YOUR_RECIPES checkpoint_params.checkpoint_path=<ABSOLUTE_PATH_TO_CHECKPOINT_FROM_PREVIOUS_TRAINING>

BloodAxe avatar Jan 08 '24 12:01 BloodAxe

Hi @BloodAxe

Thanks for the suggestion. Is this what you mean?

model = models.get("yolo_nas_l", num_classes=2, checkpoint_path=r"custommodel/ckpt_best.pth").cuda()

trainer.train(model=model, training_params=train_params, train_loader=train_data, valid_loader=val_data)

rvryan67 avatar Jan 08 '24 13:01 rvryan67

Exactly

BloodAxe avatar Jan 08 '24 13:01 BloodAxe

The re-trained custom model is not giving me the results I expect

The original custom model predicts correctly, i.e it identifies an object with 0.9 confidence.

However, when I run the same prediction on the re-trained custom model I don't get any prediction.

I wonder am I missing something from my training_params

` train_params = { # ENABLING SILENT MODE 'silent_mode': False, "average_best_models":True, "warmup_mode": "linear_epoch_step", "warmup_initial_lr": 1e-6, "lr_warmup_epochs": 3, "initial_lr": 5e-4, "lr_mode": "cosine", "cosine_final_lr_ratio": 0.1, "optimizer": "Adam", "optimizer_params": {"weight_decay": 0.0001}, "zero_weight_decay_on_bias_and_bn": True, "ema": True, "ema_params": {"decay": 0.9, "decay_type": "threshold"},

"max_epochs": EPOCHS,
"mixed_precision": True,
"loss": PPYoloELoss(
    use_static_assigner=False,
    # NOTE: num_classes needs to be defined here
    num_classes=len(dataset_params['classes']),
    reg_max=16
),
"valid_metrics_list": [
    DetectionMetrics_050(
        score_thres=0.1,
        top_k_predictions=300,
        # NOTE: num_classes needs to be defined here
        num_cls=len(dataset_params['classes']),
        normalize_targets=True,
        post_prediction_callback=PPYoloEPostPredictionCallback(
            score_threshold=0.01,
            nms_top_k=1000,
            max_predictions=300,
            nms_threshold=0.7
        )
    )
],
"metric_to_watch": '[email protected]'

}

`

rvryan67 avatar Jan 08 '24 13:01 rvryan67

The provided snippet is not enough to help. Please show the rest of code including a data loader's preparation (before and after you add more data) and tensorboard plots for regular training and with additional data

BloodAxe avatar Jan 09 '24 06:01 BloodAxe

I'm using an AWS Sagemaker Training job to train the model.

Here is the code I use to create the Training Job

from sagemaker.estimator import Estimator from sagemaker.pytorch import PyTorch from sagemaker.session import TrainingInput

train_input = TrainingInput(dataset_s3_uri)

estimator = PyTorch( entry_point="train.py", role=role, source_dir="./yolo-nas-model-scripts", instance_count=1, instance_type='ml.g4dn.12xlarge', framework_version="1.13.1", py_version="py39", sagemaker_session=sagemaker_session, input_mode='File', # FastFile causes a issue with writing label cache output_path=dataset_s3_uri+'/output', )

estimator.fit(train_input, job_name=job_name)

train.py attached, renamed to train.txt train.txt

rvryan67 avatar Jan 09 '24 16:01 rvryan67

@BloodAxe

Is there a way to incrementally add a small amount of additional training data and re-run training without it taking 12 hours to complete?

My original question quoted above ^^

I have seen in the following discussion: https://github.com/ultralytics/ultralytics/issues/4554#issuecomment-1695218721

Currently, YOLOv8 does not have a feature for incremental learning

Is the same true of YOLO-NAS?

rvryan67 avatar Jan 10 '24 14:01 rvryan67

So what you are looking for is continual learning. A technique which allow to train a model on a few data samples without forgetting the existing knowledge.

Unfortunately at the moment we don't supoort this.

BloodAxe avatar Jan 18 '24 07:01 BloodAxe