albumentations icon indicating copy to clipboard operation
albumentations copied to clipboard

Now random stuff can receive, and use, a RandomState object

Open ricardodeazambuja opened this issue 2 years ago • 9 comments

PROS:

  • you can pass a numpy.random.RandomState object to everything that uses a pseudo-random generator
  • you don't always need to pass a numpy.random.RandomState object because, by default, it will fall back to using the current state of numpy.random (if you don't care about reproducibility or need to debug something, it's just fine)
  • pickle still works :smile:

CONS:

  • Python standard library random was totally replaced by numpy.random (sometimes adapted), therefore it may be slower in some situations and it may need some TLC to optimize things
  • Old code that was using np.seed or random.seed will be even more unreliable now
  • Numpy is already moving away from RandomState... something to think in the near future
  • Tests using the files from tests/files were skipped
  • Not deeply debugged or tested besides the automatic tests

ricardodeazambuja avatar Nov 23 '22 23:11 ricardodeazambuja

Current implementation will work incorrectly with python multiprocessing because each process wil use the same random seed. For this reason we used random istead of np.random as base random generator.

Dipet avatar Nov 24 '22 03:11 Dipet

Look at https://github.com/albumentations-team/albumentations/blob/master/tests/test_random.py

Dipet avatar Nov 24 '22 03:11 Dipet

Hi @Dipet, I just pushed a simple modification that has a test specifically for the situation where the user wants to use the random state. Could you please give me another code example where this implementation will work incorrectly with python multiprocessing since it passes the original test? I never used albumentations before and I will start using it just because it seemed easier to implement the use of random states compared to torchvision transforms :smiley: Thanks!

ricardodeazambuja avatar Nov 24 '22 13:11 ricardodeazambuja

I've been using this modified version passing a Numpy RandomState object and it is super useful for reproducibility. Albumentations is amazing, and I love using it, so I am here offering my help. What else should I do to get it merged?

ricardodeazambuja avatar Jan 03 '23 14:01 ricardodeazambuja

Why was this closed?

jpcbertoldo avatar Jan 31 '23 09:01 jpcbertoldo

@jpcbertoldo, this PR is still alive and kicking :smile: #1377 was closed because it depends on this one, but this one never gets pulled and I need to keep updating it in the hope one day it will get accepted :cry:

ricardodeazambuja avatar Jan 31 '23 12:01 ricardodeazambuja

Ah yes, I confused with #1377 my bad hehe

hope one day it will get accepted

inshallah! it's a major upgrade

For this reason we used random istead of np.random as base random generator.

Would it be possible to use random.Random instead of np.random.RandomState?

jpcbertoldo avatar Jan 31 '23 13:01 jpcbertoldo

np.random

Do you know if there's any issue open in numpy for this issue?

I'm not sure because I don't know that much about parallelization, but could there be any hint in here?

https://numpy.org/doc/stable/reference/random/parallel.html

jpcbertoldo avatar Jan 31 '23 13:01 jpcbertoldo

np.random

Do you know if there's any issue open in numpy for this issue?

I'm not sure because I don't know that much about parallelization, but could there be any hint in here?

https://numpy.org/doc/stable/reference/random/parallel.html

Everything is working, and I just merged the latest stuff from master, again. The problem was in the test itself because the test was not expecting something using a random state, but I fixed it long time ago.

ricardodeazambuja avatar Jan 31 '23 13:01 ricardodeazambuja