clpy icon indicating copy to clipboard operation
clpy copied to clipboard

Genarate the same random ndarray when size of random ndarray is changed

Open neko-suki opened this issue 6 years ago • 6 comments

  • Following simple program shows that when I put different shape to clpy.random.rand(), the first few elements of result in ndarray is the same.
import clpy
import time

tmp = clpy.random.rand(5)
print(tmp)

tmp =clpy.random.rand(2)
print(tmp)
  • Following is the result. You can see that the first 2 element is the same.
$ python sample_random4.py 
[0.44014896 0.8914092  0.34266944 0.79392968 0.24518993]
[0.44014896 0.8914092 ]

The reason is to use the same seed self.seed_value when the size of random number array is changed.

https://github.com/fixstars/clpy/blob/clpy/clpy/random/generator.py#L178

neko-suki avatar Feb 01 '19 01:02 neko-suki

I show another example since the example code in https://github.com/fixstars/clpy/issues/165#issue-405507707 is resolved by modification for #166.

  • Code
import clpy
import time
tmp = clpy.random.rand(5)
print(tmp)
tmp =clpy.random.rand(6)
print(tmp)
  • Result on the Secondary Machine
[0.34526245 0.79652269 0.24778294 0.69904318 0.15030342]
[0.34526245 0.79652269 0.24778294 0.69904318 0.15030342 0.60156366]

neko-suki avatar Feb 05 '19 07:02 neko-suki

I tried to reuse the first element of self.seed_array[0]. It can solve the problem.

The result of code in https://github.com/fixstars/clpy/issues/165#issuecomment-460541087 is as follows.

$ python new_random.py 
[0.90678602 0.35804626 0.8093065  0.26056675 0.71182699]
[0.74465217 0.19591241 0.64717265 0.0984329  0.54969314 0.00095338]
  • Diff is as follows.
$ git diff
diff --git a/clpy/random/generator.py b/clpy/random/generator.py
index 5430281..c6e40fa 100644
--- a/clpy/random/generator.py
+++ b/clpy/random/generator.py
@@ -176,9 +176,13 @@ class RandomState(object):
 
         if (not isinstance(self.seed_array, clpy.ndarray)
                 or self.seed_array.size < array_size):
+            if (not isinstance(self.seed_array, clpy.ndarray)):
+                initial_seed = self.seed_value
+            else:
+                initial_seed = self.seed_array[0]
             self.seed_array = clpy.empty(size, "uint")
             tmp_seed_array = clpy.empty(size, "uint")
-            tmp_seed_array.fill(self.seed_value)
+            tmp_seed_array.fill(initial_seed)
             RandomState._init_kernel(tmp_seed_array, self.seed_array)
             # not to use similar number for the first generation
             RandomState._lcg_kernel(self.seed_array, out)

neko-suki avatar Feb 05 '19 10:02 neko-suki

How about adding influctuation to generate rand number (e.g. time) when rand() called?

LWisteria avatar Feb 05 '19 10:02 LWisteria

@LWisteria From the performance point of view, it takes much longer time only the first time the program pass through initial_seed = self.seed_array[0]. I don't know the reason why it happens.

It can be reproduced every time and it happens on both of the Primary Machine and the Secondary Machine.

  • code
import clpy
import time

base = 100000
#initial call
beg = time.time()
clpy.random.rand(base) 
end = time.time()
print("time = {:.5f} msec".format(end - beg))

for i in range(20):
    beg = time.time()
    clpy.random.rand(base + 1 + i)
    end = time.time()
    print("time = {:.5f} msec".format(end - beg))
  • Result on the Primary Machine (Vega)
time = 0.13137 msec
time = 0.04378 msec
time = 0.00024 msec
time = 0.00011 msec
time = 0.00010 msec
time = 0.00010 msec
time = 0.00011 msec
time = 0.00010 msec
time = 0.00010 msec
time = 0.00010 msec
time = 0.00009 msec
time = 0.00011 msec
time = 0.00010 msec
time = 0.00010 msec
time = 0.00011 msec
time = 0.00010 msec
time = 0.00010 msec
time = 0.00011 msec
time = 0.00011 msec
time = 0.00075 msec
time = 0.00011 msec
  • Result on the Secondary Machine (titanv)
$ python new_random4.py 
time = 0.12433 msec
time = 0.04277 msec
time = 0.00138 msec
time = 0.00150 msec
time = 0.00104 msec
time = 0.00101 msec
time = 0.00093 msec
time = 0.00089 msec
time = 0.00088 msec
time = 0.00088 msec
time = 0.00089 msec
time = 0.00089 msec
time = 0.00089 msec
time = 0.00090 msec
time = 0.00090 msec
time = 0.00088 msec
time = 0.00089 msec
time = 0.00094 msec
time = 0.00063 msec
time = 0.00115 msec
time = 0.00064 msec

neko-suki avatar Feb 05 '19 10:02 neko-suki

it takes much longer time only the first time the program

I think this is not problem on random generator. It happenes the first time to execute every cupy/clpy kernel because it needs to initialize something in the runtime lib.

Anyway, doesn't that comment refer to my comment?

How about adding influctuation to generate rand number (e.g. time) when rand() called?

I mean, this issue (not performance, it was already solved on #162 ) could be solved if you pass time value to kernel argument and add it and each thread's seeds.

LWisteria avatar Feb 05 '19 23:02 LWisteria

I'm sorry I misread your comments.

I mean, this issue (not performance, it was already solved on #162 ) could be solved if you pass time value to kernel argument and add it and each thread's seeds.

To get and use time value each time for initial seed is good idea. I'll do it. Thank you for your comments.

neko-suki avatar Feb 07 '19 05:02 neko-suki