datumaro
datumaro copied to clipboard
[WIP] Utilize source extractor length in Dataset if available
Summary
Resolves #350 - it should be allowed, because it provides good options to simplify transforms, that affect only the input item, but can rename, move to another subset, or remove them. However, maybe, the ItemTransform
class should be renamed.
This patch adds a new ItemTransform
child InplaceTransform
, which prohibits changing item ids and removal of items, allowing more optimizations in Dataset
.
-
Allows to use optimized versions of
__len__
provided by the source extractors in theDataset
, when possible. -
Allows to initialize cache during the iteration, when there are only
InplaceTransforms
in the dataset. Without this change,Dataset
would always initialize cache in these situations, which is performance pessimization. -
Improved
Transform
class descriptions -
Introduced a dedicated object, representing we're going to use parent
Extractor
's values in theTransform
(forlen
andsubsets
) -
[ ] Assess the impact and corner cases
-
[x] Add more tests
How to test
Checklist
- [ ] I submit my changes into the
develop
branch - [ ] I have added description of my changes into CHANGELOG
- [ ] I have updated the documentation accordingly
- [ ] I have added tests to cover my changes
- [ ] I have linked related issues
License
- [ ] I submit my code changes under the same MIT License that covers the project. Feel free to contact the maintainers if that's a concern.
- [ ] I have updated the license header for each file (see an example below)
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT