DALI icon indicating copy to clipboard operation
DALI copied to clipboard

Bug in _DaliBaseIterator __len__

Open tdemin16 opened this issue 2 years ago • 1 comments

I was trying to use 2 DALIGenericIterators with PyTorch lightning and I encountered a WARNING:root:DALI iterator does not support resetting while epoch is not finished. This happened even though the batch size of the second data loader was set in such a way that the number of steps matched the length of the first data loader. I must also mention that I encountered this bug using DROP as the last batch policy.

So I checked the source code and my explanation is the following: Each time __next__ is called, variable _counter is incremented by batch size: self._counter += self.batch_size. That usually works all the time but, when _size // batch_size == _size / batch_size, then _counter == _size at iteration __len__ - 2 instead of __len__ - 1. So a quick solution would be to fix the __len__ method.

I hope I have been clear in my explanation and that I did not misunderstand anything.

tdemin16 avatar Mar 03 '23 15:03 tdemin16

Hi @tdemin16,

Thank you for reaching out. I understand your concern regarding the operation of the DALIGenericIterators. The idea for the DROP policy is (as stated in the documentation) drop the last batch if it cannot be fully filled with data from the current epoch. In your case, it can be so this could be expected. If you can provide a minimal and self-contained repro we can run it would help us determine if this is the case.

JanuszL avatar Mar 06 '23 08:03 JanuszL