h2o-llmstudio Dtype & HF Push Changes

Dtype & HF Push Changes

Open psinger opened this issue 2 years ago • 0 comments

This PR:

Removes the casting to float32 in case of LORA + float16
Adds a new mode for pushing to HF which first loads the model on CPU and then shards it across GPUs before pushing - it is probably redundant as it should do the same as CPU, but does not hurt to add it. It might be also the case that the original device is saved as a flag in the weights, which can cause downstream issues and which can be resolved via this method. In best case we would directly load the weights sharded on GPU to never be on CPU, but I did not manage to do this with LORA merging etc.

Jun 22 '23 11:06 psinger