super-gradients icon indicating copy to clipboard operation
super-gradients copied to clipboard

ValueError: 'accuracy' is not in list

Open srinandan opened this issue 2 years ago • 1 comments

💡 Your Question

When running a training with YOLONAS, I get an error towards the end of training (or so I presume):

This is the line where the error occurs (line 156 in the code):

    trainer.train(
        model=model,
        training_params=train_params,
        train_loader=train_data,
        valid_loader=val_data
    )

Error Stack:

[2023-06-12 06:04:14] INFO - base_sg_logger.py - [CLEANUP] - Successfully stopped system monitoring process
[2023-06-12 06:04:14] ERROR - sg_trainer_utils.py - Uncaught exception
Traceback (most recent call last):
  File /app/run_module.py\, line 186, in <module>
    main()
  File /app/run_module.py\, line 159, in main
    trainer.train(
  File /opt/conda/lib/python3.10/site-packages/super_gradients/training/sg_trainer/sg_trainer.py\, line 1240, in train
    train_metrics_tuple = self._train_epoch(epoch=epoch, silent_mode=silent_mode)
  File /opt/conda/lib/python3.10/site-packages/super_gradients/training/sg_trainer/sg_trainer.py\, line 441, in _train_epoch
    loss, loss_log_items = self._get_losses(outputs, targets)
  File /opt/conda/lib/python3.10/site-packages/super_gradients/training/sg_trainer/sg_trainer.py\, line 485, in _get_losses
    self._init_monitored_items()
  File /opt/conda/lib/python3.10/site-packages/super_gradients/training/sg_trainer/sg_trainer.py\, line 499, in _init_monitored_items
    self.metric_idx_in_results_tuple = fuzzy_idx_in_list(self.metric_to_watch, self.loss_logging_items_names + get_metrics_titles(self.valid_metrics))
  File /opt/conda/lib/python3.10/site-packages/super_gradients/training/utils/utils.py\, line 226, in fuzzy_idx_in_list
    return [fuzzy_str(x) for x in lst].index(fuzzy_str(name))
ValueError: 'accuracy' is not in list

Here are the training parameters:

    train_params = {
        "silent_mode": False,
        "average_best_models": True,
        "warmup_mode": "linear_epoch_step",
        "warmup_initial_lr": 1e-6,
        "lr_warmup_epochs": 3,
        "initial_lr": 5e-4,
        "lr_mode": "cosine",
        "cosine_final_lr_ratio": 0.1,
        "optimizer": "Adam",
        "optimizer_params": {"weight_decay": 0.0001},
        "zero_weight_decay_on_bias_and_bn": True,
        "ema": True,
        "ema_params": {"decay": 0.9, "decay_type": "threshold"},
        "max_epochs": args["epoch"],
        "mixed_precision": True,
        "loss": PPYoloELoss(
            use_static_assigner=False,
            num_classes=len(yaml_params["names"]),
            reg_max=16
        ),
        "valid_metrics_list": [
            DetectionMetrics_050(
                score_thres=0.1,
                top_k_predictions=300,
                num_cls=len(yaml_params["names"]),
                normalize_targets=True,
                post_prediction_callback=PPYoloEPostPredictionCallback(
                    score_threshold=0.01,
                    nms_top_k=1000,
                    max_predictions=300,
                    nms_threshold=0.7
                )
            )
        ],
        "metric_to_watch": "[email protected]"
    }

Versions

SuperGradient version = 3.1.1 torch = 1.13.1 Python = 3.10

I'm running this in Vertex.ai (as a custom model). So I cannot run the python script.

srinandan avatar Jun 12 '23 06:06 srinandan

Update: I have the same error with the following versions

SuperGradient version = 3.1.2
torch = 1.13.1
Python = 3.7

srinandan avatar Jun 13 '23 02:06 srinandan

I was not able to reproduce the error using simple test program when running againts SG 3.1.3 or 3.1.2. Here is the code snippet I was using:

from super_gradients import Trainer, setup_device
from super_gradients.common.object_names import Models
from super_gradients.training import models
from super_gradients.training.losses import PPYoloELoss
from super_gradients.training.metrics import DetectionMetrics_050

from super_gradients.training.dataloaders import coco2017_train_yolo_nas, coco2017_val_yolo_nas
from super_gradients.training.models.detection_models.pp_yolo_e import PPYoloEPostPredictionCallback


def main():
    root_dir = "C:/DevelopG/Develop/GitHub/Deci/super-gradients-projects/tinycoco"
    train_data = coco2017_train_yolo_nas(dataset_params={"data_dir": root_dir}, dataloader_params={"num_workers": 0})
    val_data = coco2017_val_yolo_nas(dataset_params={"data_dir": root_dir}, dataloader_params={"num_workers": 0})
    print(len(train_data), len(val_data))

    model = models.get(Models.YOLO_NAS_S, pretrained_weights="coco").cuda()

    train_params = {
        "silent_mode": False,
        "average_best_models": True,
        "warmup_mode": "linear_epoch_step",
        "warmup_initial_lr": 1e-6,
        "lr_warmup_epochs": 3,
        "initial_lr": 5e-4,
        "lr_mode": "cosine",
        "cosine_final_lr_ratio": 0.1,
        "optimizer": "Adam",
        "optimizer_params": {"weight_decay": 0.0001},
        "zero_weight_decay_on_bias_and_bn": True,
        "ema": True,
        "ema_params": {"decay": 0.9, "decay_type": "threshold"},
        "max_epochs": 10,
        "mixed_precision": True,
        "loss": PPYoloELoss(
            use_static_assigner=False,
            num_classes=80,
            reg_max=16
        ),
        "valid_metrics_list": [
            DetectionMetrics_050(
                score_thres=0.1,
                top_k_predictions=300,
                num_cls=80,
                normalize_targets=True,
                post_prediction_callback=PPYoloEPostPredictionCallback(
                    score_threshold=0.01,
                    nms_top_k=1000,
                    max_predictions=300,
                    nms_threshold=0.7
                )
            )
        ],
        "metric_to_watch": "[email protected]"
    }

    setup_device()
    trainer = Trainer("issue-1162", ckpt_root_dir="checkpoints")
    trainer.train(
        model=model,
        training_params=train_params,
        train_loader=train_data,
        valid_loader=val_data
    )

if __name__ == "__main__":
    main()

And here is the full output log

C:\Users\ekhve\.conda\envs\sg-testing\python.exe C:\DevelopG\Develop\GitHub\Deci\super-gradients-projects\issue-1162\main.py 
The console stream is logged into C:\Users\ekhve\sg_logs\console.log
[2023-08-10 14:51:40] INFO - crash_tips_setup.py - Crash tips is enabled. You can set your environment variable to CRASH_HANDLER=FALSE to disable it
[2023-08-10 14:51:41] WARNING - redirects.py - NOTE: Redirects are currently not supported in Windows or MacOs.
[2023-08-10 14:51:46] WARNING - env_sanity_check.py - Failed to verify operating system: Deci officially supports only Linux kernels. Some features may not work as expected.
WARNING: Logging before flag parsing goes to stderr.
W0810 14:51:46.874871  3972 env_sanity_check.py:30] Failed to verify operating system: Deci officially supports only Linux kernels. Some features may not work as expected.
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
Caching annotations: 100%|██████████| 32/32 [00:00<00:00, 1692.13it/s]
Caching annotations: 100%|██████████| 6/6 [00:00<00:00, 760.11it/s]
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
1 1
The console stream is now moved to checkpoints\issue-1162/console_Aug10_14_51_48.txt
[2023-08-10 14:51:52] INFO - sg_trainer.py - Using EMA with params {'decay': 0.9, 'decay_type': 'threshold'}
[2023-08-10 14:51:54] INFO - sg_trainer_utils.py - TRAINING PARAMETERS:
    - Mode:                         OFF
    - Number of GPUs:               1          (1 available on the machine)
    - Dataset size:                 30         (len(train_set))
    - Batch size per GPU:           25         (batch_size)
    - Batch Accumulate:             1          (batch_accumulate)
    - Total batch size:             25         (num_gpus * batch_size)
    - Effective Batch size:         25         (num_gpus * batch_size * batch_accumulate)
    - Iterations per epoch:         1          (len(train_loader))
    - Gradient updates per epoch:   1          (len(train_loader) / batch_accumulate)

[2023-08-10 14:51:54] INFO - sg_trainer.py - Started training for 10 epochs (0/9)

Train epoch 0: 100%|██████████| 1/1 [00:05<00:00,  5.89s/it, PPYoloELoss/loss=2.25, PPYoloELoss/loss_cls=1.2, PPYoloELoss/loss_dfl=1.02, PPYoloELoss/loss_iou=0.217, gpu_mem=8.83]
Validation epoch 0: 100%|██████████| 1/1 [00:00<00:00,  1.34it/s]
===========================================================
SUMMARY OF EPOCH 0
├── Training
│   ├── Ppyoloeloss/loss = 2.2521
│   ├── Ppyoloeloss/loss_cls = 1.1971
│   ├── Ppyoloeloss/loss_dfl = 1.0226
│   └── Ppyoloeloss/loss_iou = 0.2175
└── Validation
    ├── [email protected] = 0.3248
    ├── [email protected] = 0.6509
    ├── Ppyoloeloss/loss = 1.4454
    ├── Ppyoloeloss/loss_cls = 0.7288
    ├── Ppyoloeloss/loss_dfl = 0.7688
    ├── Ppyoloeloss/loss_iou = 0.1329
    ├── [email protected] = 0.2307
    └── [email protected] = 0.8663

===========================================================
[2023-08-10 14:52:01] INFO - base_sg_logger.py - Checkpoint saved in checkpoints\issue-1162\ckpt_best.pth
[2023-08-10 14:52:01] INFO - sg_trainer.py - Best checkpoint overriden: validation [email protected]: 0.650887131690979
Train epoch 1: 100%|██████████| 1/1 [00:01<00:00,  1.73s/it, PPYoloELoss/loss=2.17, PPYoloELoss/loss_cls=1.15, PPYoloELoss/loss_dfl=1.02, PPYoloELoss/loss_iou=0.204, gpu_mem=8.68]
Validation epoch 1: 100%|██████████| 1/1 [00:00<00:00,  1.72it/s]
===========================================================
SUMMARY OF EPOCH 1
├── Training
│   ├── Ppyoloeloss/loss = 2.1742
│   │   ├── Best until now = 2.2521 (↘ -0.0779)
│   │   └── Epoch N-1      = 2.2521 (↘ -0.0779)
│   ├── Ppyoloeloss/loss_cls = 1.1544
│   │   ├── Best until now = 1.1971 (↘ -0.0428)
│   │   └── Epoch N-1      = 1.1971 (↘ -0.0428)
│   ├── Ppyoloeloss/loss_dfl = 1.0179
│   │   ├── Best until now = 1.0226 (↘ -0.0046)
│   │   └── Epoch N-1      = 1.0226 (↘ -0.0046)
│   └── Ppyoloeloss/loss_iou = 0.2044
│       ├── Best until now = 0.2175 (↘ -0.0131)
│       └── Epoch N-1      = 0.2175 (↘ -0.0131)
└── Validation
    ├── [email protected] = 0.2527
    │   ├── Best until now = 0.3248 (↘ -0.0721)
    │   └── Epoch N-1      = 0.3248 (↘ -0.0721)
    ├── [email protected] = 0.6063
    │   ├── Best until now = 0.6509 (↘ -0.0445)
    │   └── Epoch N-1      = 0.6509 (↘ -0.0445)
    ├── Ppyoloeloss/loss = 1.512
    │   ├── Best until now = 1.4454 (↗ 0.0666)
    │   └── Epoch N-1      = 1.4454 (↗ 0.0666)
    ├── Ppyoloeloss/loss_cls = 0.7584
    │   ├── Best until now = 0.7288 (↗ 0.0296)
    │   └── Epoch N-1      = 0.7288 (↗ 0.0296)
    ├── Ppyoloeloss/loss_dfl = 0.7988
    │   ├── Best until now = 0.7688 (↗ 0.03)
    │   └── Epoch N-1      = 0.7688 (↗ 0.03)
    ├── Ppyoloeloss/loss_iou = 0.1417
    │   ├── Best until now = 0.1329 (↗ 0.0088)
    │   └── Epoch N-1      = 0.1329 (↗ 0.0088)
    ├── [email protected] = 0.1782
    │   ├── Best until now = 0.2307 (↘ -0.0525)
    │   └── Epoch N-1      = 0.2307 (↘ -0.0525)
    └── [email protected] = 0.7783
        ├── Best until now = 0.8663 (↘ -0.0881)
        └── Epoch N-1      = 0.8663 (↘ -0.0881)

===========================================================
Train epoch 2: 100%|██████████| 1/1 [00:01<00:00,  1.73s/it, PPYoloELoss/loss=1.95, PPYoloELoss/loss_cls=0.998, PPYoloELoss/loss_dfl=0.921, PPYoloELoss/loss_iou=0.196, gpu_mem=8.72]
Validation epoch 2: 100%|██████████| 1/1 [00:00<00:00,  2.02it/s]
===========================================================
SUMMARY OF EPOCH 2
├── Training
│   ├── Ppyoloeloss/loss = 1.9485
│   │   ├── Best until now = 2.1742 (↘ -0.2258)
│   │   └── Epoch N-1      = 2.1742 (↘ -0.2258)
│   ├── Ppyoloeloss/loss_cls = 0.9977
│   │   ├── Best until now = 1.1544 (↘ -0.1567)
│   │   └── Epoch N-1      = 1.1544 (↘ -0.1567)
│   ├── Ppyoloeloss/loss_dfl = 0.9206
│   │   ├── Best until now = 1.0179 (↘ -0.0973)
│   │   └── Epoch N-1      = 1.0179 (↘ -0.0973)
│   └── Ppyoloeloss/loss_iou = 0.1962
│       ├── Best until now = 0.2044 (↘ -0.0082)
│       └── Epoch N-1      = 0.2044 (↘ -0.0082)
└── Validation
    ├── [email protected] = 0.2031
    │   ├── Best until now = 0.3248 (↘ -0.1217)
    │   └── Epoch N-1      = 0.2527 (↘ -0.0496)
    ├── [email protected] = 0.3472
    │   ├── Best until now = 0.6509 (↘ -0.3036)
    │   └── Epoch N-1      = 0.6063 (↘ -0.2591)
    ├── Ppyoloeloss/loss = 1.9031
    │   ├── Best until now = 1.4454 (↗ 0.4576)
    │   └── Epoch N-1      = 1.512  (↗ 0.391)
    ├── Ppyoloeloss/loss_cls = 1.047
    │   ├── Best until now = 0.7288 (↗ 0.3183)
    │   └── Epoch N-1      = 0.7584 (↗ 0.2887)
    ├── Ppyoloeloss/loss_dfl = 0.8942
    │   ├── Best until now = 0.7688 (↗ 0.1254)
    │   └── Epoch N-1      = 0.7988 (↗ 0.0954)
    ├── Ppyoloeloss/loss_iou = 0.1636
    │   ├── Best until now = 0.1329 (↗ 0.0307)
    │   └── Epoch N-1      = 0.1417 (↗ 0.0219)
    ├── [email protected] = 0.1469
    │   ├── Best until now = 0.2307 (↘ -0.0838)
    │   └── Epoch N-1      = 0.1782 (↘ -0.0313)
    └── [email protected] = 0.5374
        ├── Best until now = 0.8663 (↘ -0.3289)
        └── Epoch N-1      = 0.7783 (↘ -0.2408)

===========================================================
Train epoch 3: 100%|██████████| 1/1 [00:01<00:00,  1.75s/it, PPYoloELoss/loss=2.17, PPYoloELoss/loss_cls=1.08, PPYoloELoss/loss_dfl=1.02, PPYoloELoss/loss_iou=0.232, gpu_mem=8.76]
Validation epoch 3: 100%|██████████| 1/1 [00:00<00:00,  2.29it/s]
===========================================================
SUMMARY OF EPOCH 3
├── Training
│   ├── Ppyoloeloss/loss = 2.1715
│   │   ├── Best until now = 1.9485 (↗ 0.223)
│   │   └── Epoch N-1      = 1.9485 (↗ 0.223)
│   ├── Ppyoloeloss/loss_cls = 1.0823
│   │   ├── Best until now = 0.9977 (↗ 0.0846)
│   │   └── Epoch N-1      = 0.9977 (↗ 0.0846)
│   ├── Ppyoloeloss/loss_dfl = 1.0169
│   │   ├── Best until now = 0.9206 (↗ 0.0963)
│   │   └── Epoch N-1      = 0.9206 (↗ 0.0963)
│   └── Ppyoloeloss/loss_iou = 0.2323
│       ├── Best until now = 0.1962 (↗ 0.0361)
│       └── Epoch N-1      = 0.1962 (↗ 0.0361)
└── Validation
    ├── [email protected] = 0.0893
    │   ├── Best until now = 0.3248 (↘ -0.2355)
    │   └── Epoch N-1      = 0.2031 (↘ -0.1138)
    ├── [email protected] = 0.1076
    │   ├── Best until now = 0.6509 (↘ -0.5433)
    │   └── Epoch N-1      = 0.3472 (↘ -0.2396)
    ├── Ppyoloeloss/loss = 2.6456
    │   ├── Best until now = 1.4454 (↗ 1.2002)
    │   └── Epoch N-1      = 1.9031 (↗ 0.7425)
    ├── Ppyoloeloss/loss_cls = 1.5227
    │   ├── Best until now = 0.7288 (↗ 0.794)
    │   └── Epoch N-1      = 1.047  (↗ 0.4757)
    ├── Ppyoloeloss/loss_dfl = 1.1418
    │   ├── Best until now = 0.7688 (↗ 0.373)
    │   └── Epoch N-1      = 0.8942 (↗ 0.2475)
    ├── Ppyoloeloss/loss_iou = 0.2208
    │   ├── Best until now = 0.1329 (↗ 0.0879)
    │   └── Epoch N-1      = 0.1636 (↗ 0.0572)
    ├── [email protected] = 0.1193
    │   ├── Best until now = 0.2307 (↘ -0.1114)
    │   └── Epoch N-1      = 0.1469 (↘ -0.0276)
    └── [email protected] = 0.1665
        ├── Best until now = 0.8663 (↘ -0.6998)
        └── Epoch N-1      = 0.5374 (↘ -0.3709)

===========================================================
Train epoch 4: 100%|██████████| 1/1 [00:01<00:00,  1.74s/it, PPYoloELoss/loss=2.27, PPYoloELoss/loss_cls=1.21, PPYoloELoss/loss_dfl=1.02, PPYoloELoss/loss_iou=0.221, gpu_mem=8.69]
Validation epoch 4: 100%|██████████| 1/1 [00:00<00:00,  1.87it/s]
===========================================================
SUMMARY OF EPOCH 4
├── Training
│   ├── Ppyoloeloss/loss = 2.2669
│   │   ├── Best until now = 1.9485 (↗ 0.3184)
│   │   └── Epoch N-1      = 2.1715 (↗ 0.0954)
│   ├── Ppyoloeloss/loss_cls = 1.2055
│   │   ├── Best until now = 0.9977 (↗ 0.2078)
│   │   └── Epoch N-1      = 1.0823 (↗ 0.1232)
│   ├── Ppyoloeloss/loss_dfl = 1.018
│   │   ├── Best until now = 0.9206 (↗ 0.0973)
│   │   └── Epoch N-1      = 1.0169 (↗ 0.0011)
│   └── Ppyoloeloss/loss_iou = 0.221
│       ├── Best until now = 0.1962 (↗ 0.0248)
│       └── Epoch N-1      = 0.2323 (↘ -0.0113)
└── Validation
    ├── [email protected] = 0.0463
    │   ├── Best until now = 0.3248 (↘ -0.2785)
    │   └── Epoch N-1      = 0.0893 (↘ -0.0429)
    ├── [email protected] = 0.0511
    │   ├── Best until now = 0.6509 (↘ -0.5998)
    │   └── Epoch N-1      = 0.1076 (↘ -0.0565)
    ├── Ppyoloeloss/loss = 3.1681
    │   ├── Best until now = 1.4454 (↗ 1.7227)
    │   └── Epoch N-1      = 2.6456 (↗ 0.5225)
    ├── Ppyoloeloss/loss_cls = 1.8033
    │   ├── Best until now = 0.7288 (↗ 1.0745)
    │   └── Epoch N-1      = 1.5227 (↗ 0.2806)
    ├── Ppyoloeloss/loss_dfl = 1.3731
    │   ├── Best until now = 0.7688 (↗ 0.6043)
    │   └── Epoch N-1      = 1.1418 (↗ 0.2313)
    ├── Ppyoloeloss/loss_iou = 0.2713
    │   ├── Best until now = 0.1329 (↗ 0.1384)
    │   └── Epoch N-1      = 0.2208 (↗ 0.0505)
    ├── [email protected] = 0.0392
    │   ├── Best until now = 0.2307 (↘ -0.1915)
    │   └── Epoch N-1      = 0.1193 (↘ -0.0801)
    └── [email protected] = 0.102
        ├── Best until now = 0.8663 (↘ -0.7643)
        └── Epoch N-1      = 0.1665 (↘ -0.0645)

===========================================================
Train epoch 5: 100%|██████████| 1/1 [00:01<00:00,  1.75s/it, PPYoloELoss/loss=2.19, PPYoloELoss/loss_cls=1.09, PPYoloELoss/loss_dfl=1.08, PPYoloELoss/loss_iou=0.223, gpu_mem=8.73]
Validation epoch 5: 100%|██████████| 1/1 [00:00<00:00,  2.73it/s]
===========================================================
SUMMARY OF EPOCH 5
├── Training
│   ├── Ppyoloeloss/loss = 2.1937
│   │   ├── Best until now = 1.9485 (↗ 0.2452)
│   │   └── Epoch N-1      = 2.2669 (↘ -0.0732)
│   ├── Ppyoloeloss/loss_cls = 1.0936
│   │   ├── Best until now = 0.9977 (↗ 0.096)
│   │   └── Epoch N-1      = 1.2055 (↘ -0.1119)
│   ├── Ppyoloeloss/loss_dfl = 1.0847
│   │   ├── Best until now = 0.9206 (↗ 0.1641)
│   │   └── Epoch N-1      = 1.018  (↗ 0.0667)
│   └── Ppyoloeloss/loss_iou = 0.2231
│       ├── Best until now = 0.1962 (↗ 0.0269)
│       └── Epoch N-1      = 0.221  (↗ 0.0021)
└── Validation
    ├── [email protected] = 0.0153
    │   ├── Best until now = 0.3248 (↘ -0.3095)
    │   └── Epoch N-1      = 0.0463 (↘ -0.0311)
    ├── [email protected] = 0.0094
    │   ├── Best until now = 0.6509 (↘ -0.6415)
    │   └── Epoch N-1      = 0.0511 (↘ -0.0417)
    ├── Ppyoloeloss/loss = 3.423
    │   ├── Best until now = 1.4454 (↗ 1.9776)
    │   └── Epoch N-1      = 3.1681 (↗ 0.2549)
    ├── Ppyoloeloss/loss_cls = 1.9421
    │   ├── Best until now = 0.7288 (↗ 1.2134)
    │   └── Epoch N-1      = 1.8033 (↗ 0.1388)
    ├── Ppyoloeloss/loss_dfl = 1.4724
    │   ├── Best until now = 0.7688 (↗ 0.7036)
    │   └── Epoch N-1      = 1.3731 (↗ 0.0993)
    ├── Ppyoloeloss/loss_iou = 0.2979
    │   ├── Best until now = 0.1329 (↗ 0.165)
    │   └── Epoch N-1      = 0.2713 (↗ 0.0266)
    ├── [email protected] = 0.0726
    │   ├── Best until now = 0.2307 (↘ -0.1581)
    │   └── Epoch N-1      = 0.0392 (↗ 0.0334)
    └── [email protected] = 0.0271
        ├── Best until now = 0.8663 (↘ -0.8393)
        └── Epoch N-1      = 0.102  (↘ -0.075)

===========================================================
Train epoch 6: 100%|██████████| 1/1 [00:01<00:00,  1.78s/it, PPYoloELoss/loss=2.34, PPYoloELoss/loss_cls=1.19, PPYoloELoss/loss_dfl=1.14, PPYoloELoss/loss_iou=0.229, gpu_mem=8.88]
Validation epoch 6: 100%|██████████| 1/1 [00:00<00:00,  2.76it/s]
===========================================================
SUMMARY OF EPOCH 6
├── Training
│   ├── Ppyoloeloss/loss = 2.3372
│   │   ├── Best until now = 1.9485 (↗ 0.3887)
│   │   └── Epoch N-1      = 2.1937 (↗ 0.1435)
│   ├── Ppyoloeloss/loss_cls = 1.1938
│   │   ├── Best until now = 0.9977 (↗ 0.1961)
│   │   └── Epoch N-1      = 1.0936 (↗ 0.1002)
│   ├── Ppyoloeloss/loss_dfl = 1.142
│   │   ├── Best until now = 0.9206 (↗ 0.2214)
│   │   └── Epoch N-1      = 1.0847 (↗ 0.0573)
│   └── Ppyoloeloss/loss_iou = 0.229
│       ├── Best until now = 0.1962 (↗ 0.0328)
│       └── Epoch N-1      = 0.2231 (↗ 0.0059)
└── Validation
    ├── [email protected] = 0.0077
    │   ├── Best until now = 0.3248 (↘ -0.3171)
    │   └── Epoch N-1      = 0.0153 (↘ -0.0076)
    ├── [email protected] = 0.0119
    │   ├── Best until now = 0.6509 (↘ -0.639)
    │   └── Epoch N-1      = 0.0094 (↗ 0.0025)
    ├── Ppyoloeloss/loss = 3.5236
    │   ├── Best until now = 1.4454 (↗ 2.0782)
    │   └── Epoch N-1      = 3.423  (↗ 0.1006)
    ├── Ppyoloeloss/loss_cls = 1.9936
    │   ├── Best until now = 0.7288 (↗ 1.2648)
    │   └── Epoch N-1      = 1.9421 (↗ 0.0515)
    ├── Ppyoloeloss/loss_dfl = 1.4912
    │   ├── Best until now = 0.7688 (↗ 0.7224)
    │   └── Epoch N-1      = 1.4724 (↗ 0.0188)
    ├── Ppyoloeloss/loss_iou = 0.3138
    │   ├── Best until now = 0.1329 (↗ 0.1809)
    │   └── Epoch N-1      = 0.2979 (↗ 0.0159)
    ├── [email protected] = 0.0158
    │   ├── Best until now = 0.2307 (↘ -0.2149)
    │   └── Epoch N-1      = 0.0726 (↘ -0.0568)
    └── [email protected] = 0.0236
        ├── Best until now = 0.8663 (↘ -0.8428)
        └── Epoch N-1      = 0.0271 (↘ -0.0035)

===========================================================
Train epoch 7: 100%|██████████| 1/1 [00:01<00:00,  1.66s/it, PPYoloELoss/loss=2.26, PPYoloELoss/loss_cls=1.12, PPYoloELoss/loss_dfl=1.08, PPYoloELoss/loss_iou=0.237, gpu_mem=8.97]
Validation epoch 7: 100%|██████████| 1/1 [00:00<00:00,  3.11it/s]
===========================================================
SUMMARY OF EPOCH 7
├── Training
│   ├── Ppyoloeloss/loss = 2.2562
│   │   ├── Best until now = 1.9485 (↗ 0.3077)
│   │   └── Epoch N-1      = 2.3372 (↘ -0.081)
│   ├── Ppyoloeloss/loss_cls = 1.1237
│   │   ├── Best until now = 0.9977 (↗ 0.126)
│   │   └── Epoch N-1      = 1.1938 (↘ -0.0701)
│   ├── Ppyoloeloss/loss_dfl = 1.0786
│   │   ├── Best until now = 0.9206 (↗ 0.158)
│   │   └── Epoch N-1      = 1.142  (↘ -0.0634)
│   └── Ppyoloeloss/loss_iou = 0.2373
│       ├── Best until now = 0.1962 (↗ 0.0411)
│       └── Epoch N-1      = 0.229  (↗ 0.0083)
└── Validation
    ├── [email protected] = 0.0062
    │   ├── Best until now = 0.3248 (↘ -0.3186)
    │   └── Epoch N-1      = 0.0077 (↘ -0.0015)
    ├── [email protected] = 0.0053
    │   ├── Best until now = 0.6509 (↘ -0.6456)
    │   └── Epoch N-1      = 0.0119 (↘ -0.0066)
    ├── Ppyoloeloss/loss = 3.5213
    │   ├── Best until now = 1.4454 (↗ 2.0759)
    │   └── Epoch N-1      = 3.5236 (↘ -0.0023)
    ├── Ppyoloeloss/loss_cls = 1.985
    │   ├── Best until now = 0.7288 (↗ 1.2563)
    │   └── Epoch N-1      = 1.9936 (↘ -0.0086)
    ├── Ppyoloeloss/loss_dfl = 1.4429
    │   ├── Best until now = 0.7688 (↗ 0.6741)
    │   └── Epoch N-1      = 1.4912 (↘ -0.0483)
    ├── Ppyoloeloss/loss_iou = 0.326
    │   ├── Best until now = 0.1329 (↗ 0.193)
    │   └── Epoch N-1      = 0.3138 (↗ 0.0122)
    ├── [email protected] = 0.0144
    │   ├── Best until now = 0.2307 (↘ -0.2162)
    │   └── Epoch N-1      = 0.0158 (↘ -0.0014)
    └── [email protected] = 0.0171
        ├── Best until now = 0.8663 (↘ -0.8492)
        └── Epoch N-1      = 0.0236 (↘ -0.0064)

===========================================================
Train epoch 8: 100%|██████████| 1/1 [00:01<00:00,  1.70s/it, PPYoloELoss/loss=2.02, PPYoloELoss/loss_cls=1.01, PPYoloELoss/loss_dfl=1.01, PPYoloELoss/loss_iou=0.201, gpu_mem=8.76]
Validation epoch 8: 100%|██████████| 1/1 [00:00<00:00,  2.88it/s]
===========================================================
SUMMARY OF EPOCH 8
├── Training
│   ├── Ppyoloeloss/loss = 2.0161
│   │   ├── Best until now = 1.9485 (↗ 0.0677)
│   │   └── Epoch N-1      = 2.2562 (↘ -0.24)
│   ├── Ppyoloeloss/loss_cls = 1.0092
│   │   ├── Best until now = 0.9977 (↗ 0.0115)
│   │   └── Epoch N-1      = 1.1237 (↘ -0.1145)
│   ├── Ppyoloeloss/loss_dfl = 1.0092
│   │   ├── Best until now = 0.9206 (↗ 0.0886)
│   │   └── Epoch N-1      = 1.0786 (↘ -0.0694)
│   └── Ppyoloeloss/loss_iou = 0.2009
│       ├── Best until now = 0.1962 (↗ 0.0047)
│       └── Epoch N-1      = 0.2373 (↘ -0.0363)
└── Validation
    ├── [email protected] = 0.0107
    │   ├── Best until now = 0.3248 (↘ -0.3141)
    │   └── Epoch N-1      = 0.0062 (↗ 0.0045)
    ├── [email protected] = 0.006
    │   ├── Best until now = 0.6509 (↘ -0.6449)
    │   └── Epoch N-1      = 0.0053 (↗ 0.0007)
    ├── Ppyoloeloss/loss = 3.6864
    │   ├── Best until now = 1.4454 (↗ 2.241)
    │   └── Epoch N-1      = 3.5213 (↗ 0.1651)
    ├── Ppyoloeloss/loss_cls = 2.1107
    │   ├── Best until now = 0.7288 (↗ 1.382)
    │   └── Epoch N-1      = 1.985  (↗ 0.1257)
    ├── Ppyoloeloss/loss_dfl = 1.4626
    │   ├── Best until now = 0.7688 (↗ 0.6938)
    │   └── Epoch N-1      = 1.4429 (↗ 0.0198)
    ├── Ppyoloeloss/loss_iou = 0.3377
    │   ├── Best until now = 0.1329 (↗ 0.2048)
    │   └── Epoch N-1      = 0.326  (↗ 0.0118)
    ├── [email protected] = 0.0087
    │   ├── Best until now = 0.2307 (↘ -0.222)
    │   └── Epoch N-1      = 0.0144 (↘ -0.0057)
    └── [email protected] = 0.0171
        ├── Best until now = 0.8663 (↘ -0.8492)
        └── Epoch N-1      = 0.0171 (= 0.0)

===========================================================
Train epoch 9: 100%|██████████| 1/1 [00:01<00:00,  1.77s/it, PPYoloELoss/loss=2.06, PPYoloELoss/loss_cls=1.01, PPYoloELoss/loss_dfl=1.05, PPYoloELoss/loss_iou=0.213, gpu_mem=8.69]
Validation epoch 9: 100%|██████████| 1/1 [00:00<00:00,  2.76it/s]
===========================================================
SUMMARY OF EPOCH 9
├── Training
│   ├── Ppyoloeloss/loss = 2.0629
│   │   ├── Best until now = 1.9485 (↗ 0.1144)
│   │   └── Epoch N-1      = 2.0161 (↗ 0.0468)
│   ├── Ppyoloeloss/loss_cls = 1.0065
│   │   ├── Best until now = 0.9977 (↗ 0.0088)
│   │   └── Epoch N-1      = 1.0092 (↘ -0.0027)
│   ├── Ppyoloeloss/loss_dfl = 1.05
│   │   ├── Best until now = 0.9206 (↗ 0.1294)
│   │   └── Epoch N-1      = 1.0092 (↗ 0.0408)
│   └── Ppyoloeloss/loss_iou = 0.2126
│       ├── Best until now = 0.1962 (↗ 0.0164)
│       └── Epoch N-1      = 0.2009 (↗ 0.0116)
└── Validation
    ├── [email protected] = 0.0106
    │   ├── Best until now = 0.3248 (↘ -0.3142)
    │   └── Epoch N-1      = 0.0107 (↘ -0.0)
    ├── [email protected] = 0.006
    │   ├── Best until now = 0.6509 (↘ -0.6449)
    │   └── Epoch N-1      = 0.006  (↘ -0.0)
    ├── Ppyoloeloss/loss = 3.8045
    │   ├── Best until now = 1.4454 (↗ 2.3591)
    │   └── Epoch N-1      = 3.6864 (↗ 0.1181)
    ├── Ppyoloeloss/loss_cls = 2.2185
    │   ├── Best until now = 0.7288 (↗ 1.4898)
    │   └── Epoch N-1      = 2.1107 (↗ 0.1078)
    ├── Ppyoloeloss/loss_dfl = 1.5018
    │   ├── Best until now = 0.7688 (↗ 0.733)
    │   └── Epoch N-1      = 1.4626 (↗ 0.0392)
    ├── Ppyoloeloss/loss_iou = 0.334
    │   ├── Best until now = 0.1329 (↗ 0.2011)
    │   └── Epoch N-1      = 0.3377 (↘ -0.0037)
    ├── [email protected] = 0.0253
    │   ├── Best until now = 0.2307 (↘ -0.2053)
    │   └── Epoch N-1      = 0.0087 (↗ 0.0166)
    └── [email protected] = 0.0109
        ├── Best until now = 0.8663 (↘ -0.8554)
        └── Epoch N-1      = 0.0171 (↘ -0.0062)

===========================================================
[2023-08-10 14:52:58] INFO - sg_trainer.py - RUNNING ADDITIONAL TEST ON THE AVERAGED MODEL...
Validation epoch 10: 100%|██████████| 1/1 [00:00<00:00,  2.70it/s]
===========================================================
SUMMARY OF EPOCH 10
├── Training
│   ├── Ppyoloeloss/loss = 2.0629
│   │   ├── Best until now = 1.9485 (↗ 0.1144)
│   │   └── Epoch N-1      = 2.0161 (↗ 0.0468)
│   ├── Ppyoloeloss/loss_cls = 1.0065
│   │   ├── Best until now = 0.9977 (↗ 0.0088)
│   │   └── Epoch N-1      = 1.0092 (↘ -0.0027)
│   ├── Ppyoloeloss/loss_dfl = 1.05
│   │   ├── Best until now = 0.9206 (↗ 0.1294)
│   │   └── Epoch N-1      = 1.0092 (↗ 0.0408)
│   └── Ppyoloeloss/loss_iou = 0.2126
│       ├── Best until now = 0.1962 (↗ 0.0164)
│       └── Epoch N-1      = 0.2009 (↗ 0.0116)
└── Validation
    ├── [email protected] = 0.0568
    │   ├── Best until now = 0.3248 (↘ -0.268)
    │   └── Epoch N-1      = 0.0106 (↗ 0.0462)
    ├── [email protected] = 0.0732
    │   ├── Best until now = 0.6509 (↘ -0.5776)
    │   └── Epoch N-1      = 0.006  (↗ 0.0673)
    ├── Ppyoloeloss/loss = 3.0419
    │   ├── Best until now = 1.4454 (↗ 1.5965)
    │   └── Epoch N-1      = 3.8045 (↘ -0.7626)
    ├── Ppyoloeloss/loss_cls = 1.8235
    │   ├── Best until now = 0.7288 (↗ 1.0947)
    │   └── Epoch N-1      = 2.2185 (↘ -0.395)
    ├── Ppyoloeloss/loss_dfl = 1.2336
    │   ├── Best until now = 0.7688 (↗ 0.4648)
    │   └── Epoch N-1      = 1.5018 (↘ -0.2683)
    ├── Ppyoloeloss/loss_iou = 0.2407
    │   ├── Best until now = 0.1329 (↗ 0.1078)
    │   └── Epoch N-1      = 0.334  (↘ -0.0934)
    ├── [email protected] = 0.1192
    │   ├── Best until now = 0.2307 (↘ -0.1115)
    │   └── Epoch N-1      = 0.0253 (↗ 0.0938)
    └── [email protected] = 0.1023
        ├── Best until now = 0.8663 (↘ -0.764)
        └── Epoch N-1      = 0.0109 (↗ 0.0915)

===========================================================
[2023-08-10 14:52:59] INFO - base_sg_logger.py - [CLEANUP] - Successfully stopped system monitoring process

Process finished with exit code 0

BloodAxe avatar Aug 10 '23 11:08 BloodAxe