h2o4gpu icon indicating copy to clipboard operation
h2o4gpu copied to clipboard

Methods for model persistence/checkpointing

Open terrytangyuan opened this issue 7 years ago • 7 comments

Is this already possible in Python API in a similar fashion to sklearn (e.g. http://scikit-learn.org/stable/modules/model_persistence.html)? What about the H2O ones? @navdeep-G

We need these methods for R API as well, e.g. invoke the correct methods for models using different implementations.

terrytangyuan avatar Feb 07 '18 18:02 terrytangyuan

@terrytangyuan I have not looked at this in particular, but I think we should have it. Let me see if it can be done or not.

navdeep-G avatar Feb 07 '18 18:02 navdeep-G

We don't have persistence yet.

Is this something we need right now for R? We can put this in the backlog but we'll have to start focusing on pre-GTC preparations first (getting current algorithms into Flow, preparing a CUDA bootcamp, maybe a 0.1v R API).

mdymczyk avatar Feb 08 '18 01:02 mdymczyk

I think pre-GTC is those points you mentioned. @mdymczyk @ledell @terrytangyuan

navdeep-G avatar Feb 08 '18 01:02 navdeep-G

Sure, this isn’t R specific. It needs to be done in Python. Isn’t this a must for every users? Imagine you trained a model using GPUs for a couple days but there is no way to save and load it later? Usually for large models the training process needs to be checkpointed on a regular basis in case of failures.

terrytangyuan avatar Feb 08 '18 01:02 terrytangyuan

@terrytangyuan Yes, agreed. I think @mdymczyk main point was to prioritize what can be done pre-GTC. I feel this is necessary regardless of GTC, but we just have to prioritize things now.

navdeep-G avatar Feb 08 '18 01:02 navdeep-G

Oh for sure you guys decide priorities for this. I wasn’t aware of GTC - probably you discussed during standup this week.

terrytangyuan avatar Feb 08 '18 01:02 terrytangyuan

Guys, when, do you think, will this feature be available? We can't use it for the thesis since the result is not reproducible.

bkavlak avatar Apr 20 '20 14:04 bkavlak