datumaro icon indicating copy to clipboard operation
datumaro copied to clipboard

[WIP] Utilize source extractor length in Dataset if available

Open zhiltsov-max opened this issue 3 years ago • 0 comments

Summary

Resolves #350 - it should be allowed, because it provides good options to simplify transforms, that affect only the input item, but can rename, move to another subset, or remove them. However, maybe, the ItemTransform class should be renamed.

This patch adds a new ItemTransform child InplaceTransform, which prohibits changing item ids and removal of items, allowing more optimizations in Dataset.

  • Allows to use optimized versions of __len__ provided by the source extractors in the Dataset, when possible.

  • Allows to initialize cache during the iteration, when there are only InplaceTransforms in the dataset. Without this change, Dataset would always initialize cache in these situations, which is performance pessimization.

  • Improved Transform class descriptions

  • Introduced a dedicated object, representing we're going to use parent Extractor's values in the Transform (for len and subsets)

  • [ ] Assess the impact and corner cases

  • [x] Add more tests

How to test

Checklist

  • [ ] I submit my changes into the develop branch
  • [ ] I have added description of my changes into CHANGELOG
  • [ ] I have updated the documentation accordingly
  • [ ] I have added tests to cover my changes
  • [ ] I have linked related issues

License

  • [ ] I submit my code changes under the same MIT License that covers the project. Feel free to contact the maintainers if that's a concern.
  • [ ] I have updated the license header for each file (see an example below)
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

zhiltsov-max avatar Dec 22 '21 13:12 zhiltsov-max