heat icon indicating copy to clipboard operation
heat copied to clipboard

Same numbers for random number on different split axis

Open TheSlimvReal opened this issue 5 years ago • 6 comments

Description Currently the random array returned by ht.random.rand is different depending on the split of the tensor.

For example:

>>> ht.random.seed(1)
>>> a = ht.random.randn(4, 4, split=0)
>>> ht.random.seed(1)
>>> b = ht.random.randn(16, split=0)
>>> self.assertTrue(np.array_equal(a.numpy().flatten(), b.numpy()))

>>> ht.random.seed(1)
>>> a = ht.random.randn(4, 4, split=1)
>>> self.asserTrue(np.array_equal(a.numpy().flatten(), b.numpy()))
AssertionError([..])

Expected behavior The counter_sequence function needs to be adapted to take the actual split axis into the calculation.

TheSlimvReal avatar Sep 16 '19 11:09 TheSlimvReal

The problem is the following: The Random numbers are generated by creating tuples of increasing sequences, where the second value is increased until a threshold is reached. After that the first value is increased by one and the process starts over ((0, 0), (0, 1), (0, 2), ... (0, MAX), (1, 0), (1, 1), ...). These tuples are transformed into "random" numbers by the threefry algorithm. Each tuple creates 2 new numbers that need to be placed next to each other in order to create a random distribution. This can be illustrated by placing the initial tuples in the shape the final result will have. For a 3x5 Matrix this would look like the following:

    |(0,  0) (0,  1) (0,| 2)
 (0,| 2) (0,  3) (0,  4)|
    |(0,  5) (0,  6) (0,| 7)

(The parenthesis are just for highlighting the tuples and the vertical lines display the bounds of the resulting matrix)

Because of the odd number off elements in each row, some of the tuples need to be reused (The first time we use it, the first value of the created number is used, the second time, the second number is used and the first one ignored.)

Now these tuples and the resulting random number should be created distributed across all available processes. Therefore each process gets a equal share of the final shape and fills this shape with the random values. In our example and a split along the 0 axis on two processes would result in the following:

Proc 1:
    |(0,  0) (0,  1) (0,| 2)
 (0,| 2) (0,  3) (0,  4)|
-----------------------------
Proc 2:
    |(0,  5) (0,  6) (0,| 7)

In this case the process only needs to know at what offset the tuple sequence should start and how many values are needed.

The difficult cases start with split != 0: e.g. split along 1 axis and two processes:

Proc 1:              // Proc 2:
    |(0,  0) (0,| 1) // (0,| 1) (0,| 2)
 (0,| 2) (0,  3)|    //    |(0,  4)|
    |(0,  5) (0,| 6) // (0,| 6) (0,| 7)

This point is where I am stuck. I cannot think of an efficient algorithm which creates the correct tuples for a split axis != 0.

Currently I am treating these cases as if they would be the same as the split along the 0 axis. This creates a random distribution but results in different positions of the numbers depending on the initial split axis. In code:

>>> ht.random.seed(1)
>>> a = ht.random.rand(4, 4, split=0)
>>> a = a.numpy().flatten() # Unsplit tensor and flatten values to 1D shape
>>> ht.random.seed(1)
>>> b = ht.random.rand(16, split=None).numpy() # Same number of elements, same alignment
>>> np.array_equal(a, b)
True
>>> ht.random.seed(1)
>>> c = ht.random.rand(4, 4, split=1) # Same number of elements, different alignment
>>> c = c.numpy().flatten()
>>> np.array_equal(a, c)
False
>>> a, b, c = np.sort(a), np.sort(b), np.sort(c) # Reorder elements
>>> np.array_equal(a, b)
True
>>> np.array_equal(a, c)
True

TheSlimvReal avatar Dec 06 '19 13:12 TheSlimvReal

@TheSlimvReal Is this behaviour supposed to occur exclusively in the distributed case? I am unable to reproduce it locally.

lenablind avatar Feb 04 '21 14:02 lenablind

@lenablind Yes the problem only occurs if the array is split across multiple processes.

TheSlimvReal avatar Feb 09 '21 17:02 TheSlimvReal

@lenablind Yes the problem only occurs if the array is split across multiple processes.

@TheSlimvReal Alright, that makes sense. Thank you for the feedback.

lenablind avatar Feb 10 '21 13:02 lenablind

Good luck with the issue, I am curious if you can come up with a solution. I haven't been able to.

TheSlimvReal avatar Feb 10 '21 21:02 TheSlimvReal

Good luck with the issue, I am curious if you can come up with a solution. I haven't been able to.

Thank you, so am I! I'll keep you up to date about the progress.

lenablind avatar Feb 11 '21 07:02 lenablind

This issue is still open. @TheSlimvReal thanks again for the detailed explanation of the problem!


Reviewed within #1109

ClaudiaComito avatar Aug 04 '23 17:08 ClaudiaComito