physicsnemo
physicsnemo copied to clipboard
Add memory-efficient InfiniteHashSampler and infinite sampler tests
PhysicsNeMo Pull Request
Description
Introduces InfiniteHashSampler, a new memory-efficient infinite sampler designed for very large datasets (billion+ samples) that uses hash-based randomization without storing full index arrays. Tests for both infinite samplers have been added.
- Hash-Based Randomization: Deterministic pseudo-random sampling using efficient hash function
- Distributed Training Support: Full compatibility with DistributedDataParallel (DDP)
- Billion-Scale Ready: Tested with datasets up to 10 billion samples
- Sequential Fallback: Option to disable randomization for sequential access
Checklist
- [x] I am familiar with the Contributing Guidelines.
- [x] New or existing tests cover these changes.
- [ ] The documentation is up to date with these changes.
- [x] The CHANGELOG.md is up to date with these changes.
- [ ] An issue is linked to this pull request.