batchgenerators
batchgenerators copied to clipboard
Slowdown with latest release
Hi,
I just wanted to let you know, that I have some issues with the latest release.
The issues are mainly, that there is a massive slowdown in our CI/CD (from about 45-50 minutes for all jobs in the matrix up to 14 hours or more).
Here is a build with the latest release (0.19.4, installed from PyPi) and here is the same build with release 0.19.3 (installed from PyPi). These builds are completely identical (besides the batchgenerators version).
Unfortunately I did not have time to pinpoint the error (yet).
Best, Justus
Hi, thanks for letting me know! I did not observe any kind of reduction in speed. It would be great if you could compile a minimalist example where I can reproduce this behavior. Best, Fabian
I'll try, but I don't think, this will be that easy, since I could not reproduce it with the exact same tests on my local machine. In our CI/CD however, this behavior was consistent across multiple runs and branches.
Hi, I have some time today to work on issues such as this one. Unfortunately I don't know what the problem is because everything works just fine in all my experiments. Still, I am a performance guy and I want this code to perform well for everybody :-) So: Have you had the opportunity to create a code snippet to reproduce the problem? That would help a lot. Best, Fabian
Hi, Unfortunately I was not able to create a simple snippet for this, since we use batchgenerators in the midst of our framework. I will see, how much I can simplify things, but in general, we just use the multithreaded augmenter with a subclass of the DataLoader and some additional queues for interprocess communication. It works all fine with batchgenerators 0.19.3 but does not work with the latest release
Can you be a little more specific? Is the CPU usage high but nothing happens? Is the CPU not properly utilized? Where does it seem to hang? Best, Fabian
I can't tell you anything about CPU usage and stuff like that (sorry!) since this issue only occurs in our CI/CD and not on my local machine (thus it is hard to reproduce). I'll try my very best to reproduce it on my local machine with a minimal snippet.
Hi there, any news on this issue?
Hi Fabian, Thanks for getting back here. Unfortunately not. But at one point we did our own reimplementation of the multiprocessing part to better fit our pipeline, So I did not try any longer. Sorry!
OK then. Do you have any idea what could have caused this?
Unfortunately I don't. Maybe it was just some issue with our integration, since other's aren't experiencing the same.
One thing our implementation does not do well is if you re-instantiate the multithreaded augmenter all the time. It can take a while to shut down and therefore cause delays. Does this sound familiar to you?
We reinstantiated it every epoch (twice), but we had a look on that and it seems all the processes were terminated. For my understanding this should have been fine.
The processes are not the issue. The problem lies in the pin_memory_loop which for some reason I don't understand does not terminate :-/
Ah okay. This is just a theory, but have you tried to join the thread as they did in PyTorch? Besides that everything seems to be the same when it comes to the pin_memory part
I just tried it and unfortunately it does not work. The thread is not joining. I believe this may be caused by some objects not being freed. Maybe the workers did not release some file handle in their end of the pipe or something causing the Queues to be not closed.
This may be the case. Maybe it's just what the found here: https://github.com/pytorch/pytorch/blob/master/torch/utils/data/dataloader.py#L926 That they have to send one last thing which is just to check the event? But this is just some guessing based on the comparison of your code and theirs.