DALI
DALI copied to clipboard
Improve performance of ImageDecoder
Category:
Other Performance
Description:
- Populate the CUDA event pool with a few events per thread, to avoid creation of threads during the execution of the pipeline
- Use task priority in the thread pool used by nvimagecodec, so that tasks are executed on FIFO order
- Fix the preallocated_batch_size to take into account the hw_load. Not doing so, causes cuMemFree calls during the pipeline run.
Additional information:
Affected modules and functionalities:
ImageDecoder mostly (any operator that has a mixed backend thread pool)
- ImageDecoder
Key points relevant for the review:
Tests:
- [x] Existing tests apply
- [ ] New tests added
- [ ] Python tests
- [ ] GTests
- [ ] Benchmark
- [ ] Other
- [ ] N/A
Checklist
Documentation
- [x] Existing documentation applies
- [ ] Documentation updated
- [ ] Docstring
- [ ] Doxygen
- [ ] RST
- [ ] Jupyter
- [ ] Other
- [ ] N/A
DALI team only
Requirements
- [ ] Implements new requirements
- [ ] Affects existing requirements
- [x] N/A
REQ IDs: N/A
JIRA TASK: N/A