Kirill R.
Kirill R.
mostly fixed RAM usage during weights concatenation in #710, though still peaks at 33G (previously was 50G) #712 has saving of merged weights using `safetensors`, 13B loads much faster
> Does the 13B model work now? no, weights concatenation code is broken
I like my cats. I really do.
seems mainly useful for saving concated weights, also loads a little bit faster on cpu (i added hack for numpy). If you think it's not needed in the example, feel...
> Re: 13B, I assume you save and load, right? You can do that without safetensors. yep, saving first. You mean can be saved with pickle? > I expect pickled...
it's implemented as pads and add. is there a way to fix without adding a special op? 🤔
it was `CPU=1` `CLANG=1` is 51s to concat `TORCH=1` on cpu is 8s
3B f16 runs on 2080ti. Though you might need a lot of RAM to convert f32 to f16, peak is like 24G calculation of lower bound of VRAM in GiB:...
Ok, the size seems about right then. ```python # took the size from disk. huggingface shows in / 1000**3 >>> (10_161_140_290+4_656_666_941) / 1024 / 1024 / 1024 13.800158380530775 >>> (3_638_525_952...
what OS? Also check shasum just in case ``` $ shasum weights/sd-v1-4.ckpt 210783247af4f65a3d23d026490cc37a670964dd weights/sd-v1-4.ckpt ```