anomalib icon indicating copy to clipboard operation
anomalib copied to clipboard

[Bug]: Training of anomalib model on custom dataset is taking too long!

Open UTKARSH-VISCON opened this issue 1 year ago • 2 comments

Describe the bug

I am trying to train a anomalib model on my custom dataset, but its taking too long to train (even after 3 days there were no results).

I am using the same code as provided in the anomalib docs:

from anomalib.data import Folder from anomalib.models import Patchcore from anomalib.engine import Engine

Create the datamodule

datamodule = Folder( name="hazelnut_toy", root="datasets/hazelnut_toy", normal_dir="good", abnormal_dir="crack", task="classification", )

Setup the datamodule

datamodule.setup()

Create the model and engine

model = Patchcore() engine = Engine(task="classification")

Train a Patchcore model on the given datamodule

engine.train(datamodule=datamodule, model=model)

Output screen (Its just stuck at this):

┌───┬───────────────────────┬────────── │ │ Name │ Type │ Params │ Mode │ ├───┼───────────────────────┼─────────── │ 0 │ model │ PatchcoreModel │ 643 K │ train │ │ 1 │ _transform │ Compose │ 0 │ train │ │ 2 │ normalization_metrics │ MetricCollection │ 0 │ train │ │ 3 │ image_threshold │ F1AdaptiveThreshold │ 0 │ train │ │ 4 │ pixel_threshold │ F1AdaptiveThreshold │ 0 │ train │ │ 5 │ image_metrics │ AnomalibMetricCollection │ 0 │ train │ │ 6 │ pixel_metrics │ AnomalibMetricCollection │ 0 │ train │ └───┴───────────────────────┴───────────── Trainable params: 643 K
Non-trainable params: 0
Total params: 643 K
Total estimated model params size (MB): 2
Modules in train mode: 15
Modules in eval mode: 46

Dataset

Custom Dataset

Model

PatchCore

Steps to reproduce the behavior

  1. Installed Anomalib
  2. Use the anomalib repo from github
  3. Run the training code on custom dataset.

OS information

OS information:

  • OS: [Windows 11]
  • Python version: [3.10.0]
  • Anomalib version: [1.1.0]
  • PyTorch version: [2.2.2]
  • CUDA/cuDNN version: [11.8]
  • GPU models and configuration: [NVIDIA GeForce RTX 3050 Ti]
  • Any other relevant information: [I'm using a custom dataset]

Expected behavior

The model should get trained

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

# Import the datamodule
from anomalib.data import Folder

# Create the datamodule
datamodule = Folder(
    name="hazelnut_toy",
    root="datasets/hazelnut_toy",
    normal_dir="good",
    abnormal_dir="crack",
    task="classification",
)

# Setup the datamodule
datamodule.setup()

Logs

┌───┬───────────────────────┬──────────
│   │ Name                  │ Type                     │ Params │ Mode  │
├───┼───────────────────────┼───────────
│ 0 │ model                 │ PatchcoreModel           │  643 K │ train │
│ 1 │ _transform            │ Compose                  │      0 │ train │
│ 2 │ normalization_metrics │ MetricCollection         │      0 │ train │
│ 3 │ image_threshold       │ F1AdaptiveThreshold      │      0 │ train │
│ 4 │ pixel_threshold       │ F1AdaptiveThreshold      │      0 │ train │
│ 5 │ image_metrics         │ AnomalibMetricCollection │      0 │ train │
│ 6 │ pixel_metrics         │ AnomalibMetricCollection │      0 │ train │
└───┴───────────────────────┴─────────────
Trainable params: 643 K                                                        
Non-trainable params: 0                                                        
Total params: 643 K                                                            
Total estimated model params size (MB): 2                                      
Modules in train mode: 15                                                      
Modules in eval mode: 46

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

UTKARSH-VISCON avatar Aug 24 '24 05:08 UTKARSH-VISCON

Hello, how big is your dataset and which resolution images are? Both these factors will affect time of training.

abc-125 avatar Aug 24 '24 17:08 abc-125

Hello, how big is your dataset and which resolution images are? Both these factors will affect time of training.

I have a total of 90 images in my dataset (900x900 resolution)

UTKARSH-VISCON avatar Aug 27 '24 05:08 UTKARSH-VISCON

Can you try if it works with 256x256? Maybe there is some different problem, especially if the output screen is stuck.

abc-125 avatar Sep 03 '24 17:09 abc-125

@UTKARSH-VISCON, I don't think it is an Anomalib problem. Patchcore is computationally expensive, requiring too much memory, especially during the coreset sampling. As @abc-125 suggested, you could try to reduce the image size to see if it helps a bit.

samet-akcay avatar Sep 18 '24 16:09 samet-akcay