mindocr
mindocr copied to clipboard
Data pipeline refactoring
Thank you for your contribution to the MindOCR repo. Before submitting this PR, please make sure:
- [x] You have read the Contributing Guidelines on pull requests
- [x] Your code builds clean without any errors or warnings
- [x] You are using approved terminology
- [ ] You have added unit tests
Motivation
Refactored data pipeline to match best MindData practices, including:
- Use
GeneratorDatasetfor data loading only. - Use
dataset.mapoperation to apply data transformations and augmentations. - Reduce number of Python transformations by grouping them into a single operation.
- Group MindSpore operations as well.
- Move to MindSpore operations where it is possible (
Decode,Normalize,HWC2CHW). - Integrate MindRecord support.
Rebased onto the main branch to resolve conflicts.