BayesianOptimization icon indicating copy to clipboard operation
BayesianOptimization copied to clipboard

Different result on different machines for same input and random seed

Open NeerajSarwan opened this issue 6 years ago • 12 comments

Hi,

I am getting different results on different machines for the same inputs and random seed. Though the results on any single machines are consistent.

My questions are -

  1. What might be the possible reasons behind the same?
  2. What can be done to avoid it?

Thanks.

NeerajSarwan avatar Mar 25 '19 12:03 NeerajSarwan

Within the library all sources of random behaviour are deferred to an instance of np.random.RandomState. This is used for random number generation when needed, but is also passed as a parameter to sklearn's GP. I don't know what else can be done, to be honest.

fmfn avatar Mar 25 '19 14:03 fmfn

Thanks @fmfn for the quick reply. In that case, Can you please let me know whether you were previously aware of this issue or this is the first time you encountered this via this thread?

NeerajSarwan avatar Mar 26 '19 07:03 NeerajSarwan

After np.random.RandomState was added everywhere I imagined this would not be a problem, but never really tested it, to be honest.

fmfn avatar Mar 26 '19 14:03 fmfn

To my knowledge the package is setting a seed to all possible sources of randomness that allow for seed control. I suspect scipy.minimize might be the source of problem here, but, as I said, not sure.

Help with debugging this is more than appreciated. I won't have the opportunity to chase this bug down for a little while still.

fmfn avatar Apr 05 '19 17:04 fmfn

Yes, I suspect the same. But I don't have time this weekend. I will give it a shot next weekend.

NeerajSarwan avatar Apr 06 '19 15:04 NeerajSarwan

When you know the solution, please post it in the comments, thank you very much!

jiayeah9508 avatar Apr 07 '19 10:04 jiayeah9508

there is a same problem in skopt..... https://github.com/scikit-optimize/scikit-optimize/issues/682

jiayeah9508 avatar Apr 11 '19 08:04 jiayeah9508

@jiayeah9508 Hi, Do you mind sharing the processor information of your two machines on which you have tested the code that you posted in #682?

Specifically, I am interested in the output of the below command. I have a hunch which I need to confirm.

cat /proc/cpuinfo | grep "model name"

NeerajSarwan1 avatar Apr 11 '19 09:04 NeerajSarwan1

I used windows system so I can not cat... the first one is Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz 2.40GHz and the second one is Intel(R) Core(TM)2 Duo CPU E7500 @2.93GHz 2.94GHz I wonder if it meets your requirements~

jiayeah9508 avatar Apr 11 '19 10:04 jiayeah9508

Yes, this is what I needed and this strengthened my hunch. Here is what I feel about the underlying problem.

The Root cause I believe is the underlying instructions set of the machine. Older machines which are configured to use SSE4.2 instruction set behave differently than the new ones using AVX. It is possible that numpy/scipy operate differently under these instructions set. I am not still 100% sure about this hunch since I do not have access to a SSE4.2 only machine.

The first time I encountered this bug was on running test cases on a jenkins server to which I do not have an SSH access. I will try to get access to the server and see if my current hypothesis is correct or not.

NeerajSarwan1 avatar Apr 11 '19 10:04 NeerajSarwan1

For reference, this link might help which points at the same problem.

NeerajSarwan1 avatar Apr 15 '19 11:04 NeerajSarwan1

How about AMD processor? optimization becomes more and more important for deeplearning... That seems like bad news for Intel,haha..,

jiayeah9508 avatar Apr 15 '19 12:04 jiayeah9508