CIL icon indicating copy to clipboard operation
CIL copied to clipboard

Update random number generator

Open hrobarts opened this issue 11 months ago • 3 comments

Changes

Update AcquistionGeometry, ImageGeometry and VectorGeometry allocate RANDOM and RANDOM_INT methods to optionally use numpy.Generator.default_rng random number generator.

  • Avoids creating an array of float64 then casting to the array dtype, which can allocate more memory than is needed.
  • Old methods can be accessed with new fill type labels RANDOM_DEPRECATED and RANDOM_INT_DEPRECATED.
  • The NEW versions will no longer allow users to set the random seed globally, which may have an impact where algorithms don't pass the seed to allocate.
  • For now use deprecated methods in tests where they expect to be able to use a global seed. In the future we should aim to use the new methods as default. An example is https://github.com/TomographicImaging/CIL/pull/2106

Testing you performed

Tests in test_DataContainer and test_AcquisitionGeometry

Please add any demo scripts to https://github.com/TomographicImaging/CIL-Demos/tree/main/misc

Related issues/links

Closes #2033 Might close #804

Checklist

  • [x] I have performed a self-review of my code
  • [x] I have added docstrings in line with the guidance in the developer guide
  • [x] I have updated the relevant documentation
  • [x] I have implemented unit tests that cover any new or modified functionality
  • [ ] CHANGELOG.md has been updated with any functionality change
  • [ ] Request review from all relevant developers
  • [ ] Change pull request label to 'Waiting for review'

Contribution Notes

Please read and adhere to the developer guide and local patterns and conventions.

  • [x] The content of this Pull Request (the Contribution) is intentionally submitted for inclusion in CIL (the Work) under the terms and conditions of the Apache-2.0 License
  • [x] I confirm that the contribution does not violate any intellectual property rights of third parties

hrobarts avatar Jan 14 '25 10:01 hrobarts

I looked at some different methods for random number generation

  1. Previous method Using numpy.random.seed() and numpy.random.random() directly. Cons: doesn't let us choose the data type
  2. Calling numpy.random.Generator() in each file Cons: no global seed
  3. Use numpy.random.Generator() as a global rng in a utils file, with a set_seed method
stream = np.random.PCG64DXSM
global_rng = np.random.Generator(stream(None))

def set_seed(seed):
    global global_rng
    global_rng = np.random.Generator(stream(seed))

Pros: global seed Cons: using a fixed bit generator

  1. Use numpy.random.Generator() in a RandomGenerator class in a utils file
class RandomGenerator:
    def __init__(self, seed=None):
        self._seed = seed
        self._rng = np.random.Generator(np.random.PCG64DXSM(seed))

    def set_seed(self, seed):
        self._seed = seed
        self._rng = np.random.Generator(np.random.PCG64DXSM(seed))

    def __getattr__(self, name):
        return getattr(self._rng, name)

global_rng = RandomGenerator(seed=None)

Use __getattr__ method so we can directly call the random numbers like global_rng.random() rather than something like global_rng.get_rng().random() Pros: global seed Cons: complicated

Method Pros Cons
numpy.random.seed() in each file Global seed Doesn't let us choose the data type
numpy.random.Generator(seed) in each file No global seed
Global Generator with set_seed method Global seed Less flexible for future changes
RandomGenerator Class with __getattr__ Global seed Complicated

hrobarts avatar Feb 20 '25 16:02 hrobarts

Thanks Hannah, just a quick few comments, mainly on the documentation. I think most comments relate to all three files - image, acquisition and vector geometry

Thanks @MargaretDuff I think I've updated all three files with the suggestions

hrobarts avatar Mar 10 '25 15:03 hrobarts

Discussion in dev meeting

  • Use out=allocate(empty) then rng.random(out=out). To be decided: moving the random functionality to fill and just call fill from geometry.allocate? How do we want to handle the deprecated version? Could keep deprecated version in allocate and new version in fill?
  • Allocate and fill should be able to do these functions and use standard np functions to do it https://data-apis.org/array-api/latest/API_specification/creation_functions.html
  • Move allocate None to the top
  • Update the CHANGELOG

hrobarts avatar Apr 15 '25 09:04 hrobarts

Has an issue been opened/will an issue be opened to address removing the use of the deprecated methods?

lauramurgatroyd avatar May 21 '25 11:05 lauramurgatroyd

After discussion with @MargaretDuff

  • Noticed we can currently use geometry.allocate with an array because it just passes whatever is in value to fill
  • Perhaps most logical usage would be allocate to be used for generate data e.g. None, uniform or random array. And fill for fill with an array.

hrobarts avatar May 27 '25 07:05 hrobarts