h2o4gpu
h2o4gpu copied to clipboard
kmeans python iris test fails for multi-GPU.
Environment (for bugs)
- OS platform, distribution and version (e.g. Linux Ubuntu 16.04): Linux Ubuntu 16.04
- Installed from (source or binary): source
- Version: @commitID abe45eeb717d5ab4fee2f5b1386c82c93f14ab33
- Python version (optional): 3.5
- CUDA/cuDNN version: cuda v9.0, cudnn v7.1, driver v384.125
- GPU model (optional): Tesla V100 (from DGX1-Volta)
- CPU model: Intel(R) Xeon(R) CPU E5-2698 v4
- RAM available: 512GB
Please refer to google on how to obtain the above on your platform.
Description
make dotest fails for multi-GPU case under kmeans tests. The failing test is 'test_fit_iris', and it only fails for the multi-gpu case inside this test.
Repro instructions
$ pytest -s --verbose --durations=10 -n 1 -vv --fulltrace --full-trace --junit-xml=build/test-reports/h2o4gpu-test.xml tests_open/kmeans 2>&1 | tee run.log
Attaching the run.log below for your perusal. run.log
Interestingly, if the multi-gpu case is run with n_gpus=2, the above test passes.
Thanks. @mdymczyk do you have any ideas? We obviously run this test ourselves, but for 2 GPU systems on jenkins. Do you expect this test to actually pass (i.e. 1 GPU agree with 4 GPUs)?
@pseudotensor not sure yet, it should pass on any number of GPUs but maybe there's a bug somewhere - need to look into it with a profiler. When discussing this with @teju85 he also mentioned the predictions are way off so there might be a bug somewhere we're not catching with our tests.