DALI Add PyTorch DataLoader Evaluator plugin

Introduces a lightweight diagnostic tool for identifying data loading bottlenecks in PyTorch training pipelines.
This change adds Loader Evaluator inside pytorch DALI plugin, a jupyter notebook tutorial, and a documentation page with tests
LoaderEvaluator class wraps PyTorch DataLoader with performance monitoring Two operation modes: 'log' (normal iteration with metrics) and 'replay' (cached batches for ideal performance simulation)
PerformanceMetrics class for detailed performance tracking and bottleneck analysis
In-memory batch caching for replay mode to simulate ideal data loading
Comprehensive test suite and documentation with example notebook
The tool helps users compare real vs. ideal data loading performance and identify optimization opportunities.

Authored-by: Albert Wolant [email protected]

Category:

New feature (non-breaking change which adds functionality)

Description:

Introduces a lightweight diagnostic tool for identifying data loading bottlenecks in PyTorch training pipelines.
This change adds Loader Evaluator inside pytorch DALI plugin, a jupyter notebook tutorial, and a documentation page with tests
LoaderEvaluator class wraps PyTorch DataLoader with performance monitoring Two operation modes: 'log' (normal iteration with metrics) and 'replay' (cached batches for ideal performance simulation)
PerformanceMetrics class for detailed performance tracking and bottleneck analysis
In-memory batch caching for replay mode to simulate ideal data loading
Comprehensive test suite and documentation with example notebook
The tool helps users compare real vs. ideal data loading performance and identify optimization opportunities.

Additional information:

Affected modules and functionalities:

new module in Pytorch plugin
new example
new test for it
new documentation page describing the overall idea

Key points relevant for the review:

overall idea and flow

Tests:

[ ] Existing tests apply
[x] New tests added
- [x] Python tests
  - test_pytorch_loader_evaluator.py
- [ ] GTests
- [ ] Benchmark
- [ ] Other
[ ] N/A

Checklist

Documentation

[ ] Existing documentation applies
[x] Documentation updated
- [ ] Docstring
- [ ] Doxygen
- [x] RST
  - pytorch_data_loader_evaluator.rst
- [x] Jupyter
  - pytorch_data_loader_evaluator.ipynb
- [ ] Other
[ ] N/A

DALI team only

Requirements

[ ] Implements new requirements
[ ] Affects existing requirements
[x] N/A

REQ IDs: N/A

JIRA TASK: DALI-4299

Dec 05 '25 10:12 JanuszL

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Dec 05 '25 10:12 review-notebook-app[bot]

CI MESSAGE: [39670512]: BUILD STARTED

Dec 05 '25 10:12 dali-automaton

!build

Dec 05 '25 10:12 JanuszL

CI MESSAGE: [39670654]: BUILD STARTED

Dec 05 '25 10:12 dali-automaton

Greptile Overview

Greptile Summary

This PR adds a new LoaderEvaluator diagnostic tool to the PyTorch plugin for identifying data loading bottlenecks in training pipelines.

Key Changes:

New LoaderEvaluator class that wraps PyTorch DataLoader with two modes:
- log mode: Normal iteration with performance metrics collection
- replay mode: Caches batches in memory and replays them to simulate ideal (zero-overhead) data loading
Comprehensive test suite with edge case coverage
Documentation including RST pages and a Jupyter notebook tutorial demonstrating the bottleneck detection workflow

How It Works: The tool allows users to compare real data loading performance against ideal performance by caching a small number of batches and replaying them. If replay mode is significantly faster, it indicates a data loading bottleneck that could benefit from optimization (e.g., more workers, faster storage, or DALI).

Integration:

Cleanly integrates into the existing nvidia.dali.plugin.pytorch namespace
Test added to the pytorch test suite in qa/TL0_python-self-test-core/test_body.sh

Confidence Score: 4/5

This PR is safe to merge - it adds a new optional diagnostic tool with no impact on existing functionality.
Score of 4 reflects well-tested new functionality with comprehensive documentation. The implementation is straightforward and follows existing patterns in the codebase. Minor deduction because replay mode assumes DataLoader supports len() which may not work with IterableDataset.
loader.py - consider documenting the len() requirement for replay mode

Important Files Changed

File Analysis

Filename	Score	Overview
dali/python/nvidia/dali/plugin/pytorch/loader_evaluator/loader.py	4/5	Core LoaderEvaluator implementation with two modes (log/replay). Well-structured with proper error handling. Minor observation: replay mode may fail if DataLoader doesn't support len() (e.g., IterableDataset).
dali/test/python/test_pytorch_loader_evaluator.py	5/5	Comprehensive test suite covering basic functionality, modes, metrics, edge cases, and error conditions. Good coverage.
docs/examples/frameworks/pytorch/loader_evaluator/pytorch_data_loader_evaluator.ipynb	5/5	Well-written tutorial notebook demonstrating bottleneck detection workflow with clear explanations and practical example.
docs/plugins/pytorch_data_loader_evaluator.rst	5/5	Comprehensive documentation explaining the tool's purpose, technical approach, and comparison with alternatives (nsys, PyTorch Profiler).

Sequence Diagram

sequenceDiagram
    participant User as User Code
    participant LE as LoaderEvaluator
    participant DL as PyTorch DataLoader
    participant Cache as Batch Cache

    Note over User,Cache: Log Mode (Baseline Performance)
    User->>LE: for batch in loader (mode="log")
    LE->>DL: iter(dataloader)
    loop Each Batch
        LE->>DL: next()
        DL-->>LE: batch
        LE->>LE: Record batch_time
        LE-->>User: yield batch
    end
    User->>LE: get_metrics()
    LE-->>User: {total_time, avg_batch_time, ...}

    Note over User,Cache: Replay Mode (Ideal Performance)
    User->>LE: LoaderEvaluator(dataloader, mode="replay")
    LE->>DL: iter(dataloader) [during construction]
    loop Cache Batches
        DL-->>LE: batch
        LE->>Cache: append(batch)
    end
    User->>LE: for batch in loader
    loop Each Batch (from cache)
        LE->>Cache: get cached batch[i % cache_size]
        Cache-->>LE: batch
        LE->>LE: Record batch_time
        LE-->>User: yield batch
    end
    User->>LE: get_metrics()
    LE-->>User: {total_time, avg_batch_time, ...}

Dec 05 '25 10:12 greptile-apps[bot]

@greptileai please review again.

Dec 05 '25 12:12 JanuszL

!build

Dec 05 '25 12:12 JanuszL

CI MESSAGE: [39675479]: BUILD STARTED

Dec 05 '25 12:12 dali-automaton

@greptileai please review again.

Dec 05 '25 12:12 JanuszL

@greptileai please review again.

Dec 05 '25 13:12 JanuszL

!build

Dec 05 '25 13:12 JanuszL

CI MESSAGE: [39676718]: BUILD STARTED

Dec 05 '25 13:12 dali-automaton

CI MESSAGE: [39670654]: BUILD FAILED

Dec 05 '25 17:12 dali-automaton

CI MESSAGE: [39676718]: BUILD FAILED

Dec 05 '25 17:12 dali-automaton

!build

Dec 05 '25 18:12 JanuszL

CI MESSAGE: [39693133]: BUILD STARTED

Dec 05 '25 18:12 dali-automaton

CI MESSAGE: [39693133]: BUILD FAILED

Dec 05 '25 21:12 dali-automaton

CI MESSAGE: [39693133]: BUILD PASSED

Dec 07 '25 11:12 dali-automaton

!build

Dec 08 '25 05:12 JanuszL

CI MESSAGE: [39788983]: BUILD STARTED

Dec 08 '25 05:12 dali-automaton

CI MESSAGE: [39788983]: BUILD FAILED

Dec 08 '25 14:12 dali-automaton

CI MESSAGE: [39788983]: BUILD PASSED

Dec 08 '25 14:12 dali-automaton

!build

Dec 09 '25 17:12 JanuszL

CI MESSAGE: [39900875]: BUILD STARTED

Dec 09 '25 17:12 dali-automaton

!build

Dec 09 '25 17:12 JanuszL

CI MESSAGE: [39901633]: BUILD STARTED

Dec 09 '25 18:12 dali-automaton

CI MESSAGE: [39901633]: BUILD FAILED

Dec 09 '25 19:12 dali-automaton

CI MESSAGE: [39901633]: BUILD PASSED

Dec 09 '25 19:12 dali-automaton