tonic
tonic copied to clipboard
AugDiskCachedDataset to map the copy index to augmentation parameter
This branch added a child class for DiskCachedDataset
called AugDiskCachedDataset
.
Its main use is for a family of so-called deterministic augmentations with a rather discrete parameter space. For instance a noise augmentation on audio samples in which SNR can have only 5 values.
- In
DiskCachedDataset
num_copies can be used to generate N copies of a data sample. This is ok when used transforms/augmentations have an infinite/probabilistic parameter space. So the chance of generating repetitive augmented versions is very low. - On the other hand for deterministic augmentations with N parameter it is advantageous to map the copy index to the parameter to avoid re-generating existing samples and to make sure generated copies cover all desired parameter space .
- The main feature of this class is that the index of file copy is mapped to the parameter of augmentation
Hello @MinaKh! Currently I don't understand how what you're trying to achieve with this class cannot already be done with existing classes. It seems to me that you want to control the augmentations exactly, but then I don't understand why they're called augmentations. Can you please
- provide an example of how you use your proposed class
- explain with a concrete example why the current code cannot do what you need to do
Before I can merge this, this class would need a test as well, it might be helpful to add that as well.
Codecov Report
Attention: 22 lines
in your changes are missing coverage. Please review.
Comparison is base (
db13037
) 76.80% compared to head (c9d26b0
) 77.34%. Report is 12 commits behind head on develop.
Files | Patch % | Lines |
---|---|---|
tonic/cached_dataset.py | 58.69% | 19 Missing :warning: |
tonic/audio_transforms.py | 92.00% | 2 Missing :warning: |
tonic/audio_augmentations.py | 98.78% | 1 Missing :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## develop #274 +/- ##
===========================================
+ Coverage 76.80% 77.34% +0.53%
===========================================
Files 53 54 +1
Lines 3001 3165 +164
===========================================
+ Hits 2305 2448 +143
- Misses 696 717 +21
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hello @MinaKh! Currently I don't understand how what you're trying to achieve with this class cannot already be done with existing classes. It seems to me that you want to control the augmentations exactly, but then I don't understand why they're called augmentations. Can you please
- provide an example of how you use your proposed class
- explain with a concrete example why the current code cannot do what you need to do
Before I can merge this, this class would need a test as well, it might be helpful to add that as well.
Hi @biphasic! Thanks for your feedback.
- Deterministic augmentations are not uncommon, specially in audio processing and they are still called augmentation but in a more controlled way.
- I have added a notebook to
docs/tutorails/Aug_DiskCachDataset.ipynb
and have addressed your raised point there with a synthetic dataset. Please let me know if it is not clear. - I also have added a test test/test_aug_caching.py which is pretty similar to what I presented in the notebook. Please let me know if you have other ideas for tests.
- this branch has been merged with the branch of my other PR : adding audio transforms.....