Weight Sharding

Open winglian opened this issue 1 year ago • 1 comments

I'm trying to quantize 405b, but then I'm unable to upload it to HF since it's ~200GB and HF LFS has a limit on 50GB file sizes. Is there a correct way to shard the model file so it can be loaded again with AutoHQQ?

Aug 02 '24 00:08 winglian

There's a pull request for sharded safetensors serialization on-going: https://github.com/huggingface/transformers/pull/32379 Once this is fixed, it's gonna be possible to save hqq-quantized models directly via model.save_pretrained as sharded safetensors

Aug 02 '24 07:08 mobicham

Closing this since we are very close to full transformers serialization support here: https://github.com/huggingface/transformers/pull/33141

Aug 28 '24 10:08 mobicham