h2o4gpu
h2o4gpu copied to clipboard
Methods for model persistence/checkpointing
Is this already possible in Python API in a similar fashion to sklearn (e.g. http://scikit-learn.org/stable/modules/model_persistence.html)? What about the H2O ones? @navdeep-G
We need these methods for R API as well, e.g. invoke the correct methods for models using different implementations.
@terrytangyuan I have not looked at this in particular, but I think we should have it. Let me see if it can be done or not.
We don't have persistence yet.
Is this something we need right now for R? We can put this in the backlog but we'll have to start focusing on pre-GTC preparations first (getting current algorithms into Flow, preparing a CUDA bootcamp, maybe a 0.1v R API).
I think pre-GTC is those points you mentioned. @mdymczyk @ledell @terrytangyuan
Sure, this isn’t R specific. It needs to be done in Python. Isn’t this a must for every users? Imagine you trained a model using GPUs for a couple days but there is no way to save and load it later? Usually for large models the training process needs to be checkpointed on a regular basis in case of failures.
@terrytangyuan Yes, agreed. I think @mdymczyk main point was to prioritize what can be done pre-GTC. I feel this is necessary regardless of GTC, but we just have to prioritize things now.
Oh for sure you guys decide priorities for this. I wasn’t aware of GTC - probably you discussed during standup this week.
Guys, when, do you think, will this feature be available? We can't use it for the thesis since the result is not reproducible.