mindocr icon indicating copy to clipboard operation
mindocr copied to clipboard

Data pipeline refactoring

Open hadipash opened this issue 2 years ago • 1 comments

Thank you for your contribution to the MindOCR repo. Before submitting this PR, please make sure:

Motivation

Refactored data pipeline to match best MindData practices, including:

  1. Use GeneratorDataset for data loading only.
  2. Use dataset.map operation to apply data transformations and augmentations.
  3. Reduce number of Python transformations by grouping them into a single operation.
  4. Group MindSpore operations as well.
  5. Move to MindSpore operations where it is possible (Decode, Normalize, HWC2CHW).
  6. Integrate MindRecord support.

hadipash avatar Jun 16 '23 06:06 hadipash

Rebased onto the main branch to resolve conflicts.

hadipash avatar Jul 07 '23 07:07 hadipash