DALI
DALI copied to clipboard
Add origin stack trace capture for DALI operators
Category: New feature
Description:
Add capture of origin stack trace for DALI operators.
The extract_stack collects all frames from the first frame of pipeline definition to the frame with operator invocation (the outermost user API).
In regular mode no further processing is needed.
For code that was transformed by AutoGraph, the contents of captured frames are remapped back to the user code, filtered out of the _autograph and _conditionals modules (that contain internal DALI implementation).
Due to how autograph introduces additional frames (for example by implementing if statements with additional function calls), we remove repeated occurrences of the same function, keeping only the last one. AutoGraph's entry point to a function call is used to detect such regions.
Stack traces are collected in OperatorInstance - operators now get hidden argument _api that allows to detect if we are inside fn or ops invocation. This allows to discard internal frames of DALI implementation based on the API kind.
The collected stack is added as hidden arguments to OpSpec. This is backward-compatible for serialized pipelines, as well as allows for disabling and enabling the feature.
As a default the feature is disabled and only available via hidden API.
C++ test operators are extended to work as a loadable plugin based on dummy operator test. It allows to implement tests for the new feature without extending regular DALI operators.
Follow-up to this PR will utilize this information for providing better error messages - pointing to the origin of error in DALI pipeline definition.
Additional information:
Affected modules and functionalities:
New functionality, extensions for Python API and new hidden arguments in all operators.
Key points relevant for the review:
Tests:
- [ ] Existing tests apply
- [x] New tests added
- [x] Python tests
- [ ] GTests
- [ ] Benchmark
- [ ] Other
- [ ] N/A
Checklist
Documentation
- [ ] Existing documentation applies
- [ ] Documentation updated
- [ ] Docstring
- [ ] Doxygen
- [ ] RST
- [ ] Jupyter
- [ ] Other
- [ ] N/A
DALI team only
Requirements
- [ ] Implements new requirements
- [ ] Affects existing requirements
- [ ] N/A
REQ IDs: N/A
JIRA TASK: N/A
!build
CI MESSAGE: [12955067]: BUILD STARTED
CI MESSAGE: [12955067]: BUILD FAILED
Steps based on the offline discussion with @stiepan :
- extend estimation of the bottom of the stack trace by looking up if parent frames are user-defined or from DALI internals
- record the stack trace size when entering the API function and use it to skip the top
- test with pipe and push_current
- disable this feature for debug and eager mode - we rely purely on Python stack trace there
- I will probably remove
_apifor now if it is not needed here and introduce it later.
!build
CI MESSAGE: [13033625]: BUILD STARTED
CI MESSAGE: [13033625]: BUILD FAILED
!build
CI MESSAGE: [13085457]: BUILD STARTED
CI MESSAGE: [13085457]: BUILD FAILED
!build
CI MESSAGE: [13090484]: BUILD STARTED
CI MESSAGE: [13090484]: BUILD FAILED
CI MESSAGE: [13090484]: BUILD PASSED
!build
CI MESSAGE: [13153535]: BUILD STARTED
CI MESSAGE: [13153535]: BUILD FAILED
CI MESSAGE: [13153535]: BUILD PASSED
!build
CI MESSAGE: [13262456]: BUILD STARTED
CI MESSAGE: [13262456]: BUILD PASSED