Data pipeline refactoring

Open hadipash opened this issue 2 years ago • 1 comments

Thank you for your contribution to the MindOCR repo. Before submitting this PR, please make sure:

Motivation

Refactored data pipeline to match best MindData practices, including:

Use GeneratorDataset for data loading only.
Use dataset.map operation to apply data transformations and augmentations.
Reduce number of Python transformations by grouping them into a single operation.
Group MindSpore operations as well.
Move to MindSpore operations where it is possible (Decode, Normalize, HWC2CHW).
Integrate MindRecord support.

Jun 16 '23 06:06 hadipash

Rebased onto the main branch to resolve conflicts.

Jul 07 '23 07:07 hadipash