h2o4gpu Tests for model (un)pickle

We should add tests for saving and loading pickled models (something like what we already have for XGBoost) for all algorithms (pogs based, kmeans, tsvd, pca) to see if we can actually save and load all our models.

May 04 '18 14:05 mdymczyk

We don't for gpu kmeans and gpu svd. xgboost has a special hook so it copies over (in C) any references in python. We can do the same.

May 04 '18 16:05 pseudotensor

@pseudotensor do we really need that, though? At least for KMeans the only thing we need are the centroids, which we have in Python as a numpy array so we could just pickle and unpickle that. After unpickling we can just pass it (as we do now) to the C backend, no?

May 06 '18 22:05 mdymczyk

For example seems to work out of the box for KMeans (verbose logs to show it's running on GPUs):

>>> import pickle
>>> import h2o4gpu
>>> import numpy as np
>>>
>>> X = np.array([[1.,1.], [1.,4.], [1.,0.]])
>>>
>>> model = h2o4gpu.KMeans(verbose=100, n_clusters=2,random_state=1234).fit(X)

Using GPU KMeans solver with 2 GPUs.

Using h2o4gpu backend.

Using GPU KMeans solver with 2 GPUs.

Detected np.float64 data
2 gpus.
Copying data to device: 1
Copying data to device: 0
Threshold triggered. Terminating early.
  Time fit: 0.00288296 s
Timetransfer: 0.0531921 Timefit: 0.00288296 Timecleanup: 0.00114107
>>> model.cluster_centers_
array([[1., 1.],
       [1., 4.]])
>>>
>>> pickle.dump( model, open( "save.p", "wb" ) )
>>> unpickled_model = pickle.load( open( "save.p", "rb" ) )
>>> unpickled_model.cluster_centers_
array([[1., 1.],
       [1., 4.]])
>>> model.predict(X)

Using GPU KMeans solver with 2 GPUs.

Detected np.float64 data
Detected np.float64 data
2 gpus.
array([1, 0, 0], dtype=int32)
>>> unpickled_model.predict(X)

Using GPU KMeans solver with 2 GPUs.

Detected np.float64 data
Detected np.float64 data
2 gpus.
array([1, 0, 0], dtype=int32)

May 06 '18 22:05 mdymczyk

Yes, should be easy (or already true) that for kmeans easy as only thing fit does is find centroids.

May 06 '18 23:05 pseudotensor

@pseudotensor yes, thought it would work out of the box for all our models since we copy all the necessary data from C to Python, but @wenphan noticed that for POGS based models it was having problems pickling (ask from a potential user). From the log it had to do something with . CDLL and/or ctypes so maybe for POGS we need to do some more work but hopefully kmeans and svd are already good.

May 06 '18 23:05 mdymczyk

We should move forward on dropping pogs anyways. I have a gblinear wrapper we can use as base line that does lambda search with warm start. We can use the rest of the CV fold stuff but do it in python instead of C. Probably easiest.

May 06 '18 23:05 pseudotensor

@pseudotensor yes once @RAMitchell impl is stable enough I'm 100% for removing POGS altogether from the codebase.

May 06 '18 23:05 mdymczyk

h2o4gpu h2o4gpu copied to clipboard

Tests for model (un)pickle

h2o4gpu
h2o4gpu copied to clipboard