Kirill R. comments

Results 69 comments of


                                            Kirill R.

LLaMA 13B not running on 32GB M1

mostly fixed RAM usage during weights concatenation in #710, though still peaks at 33G (previously was 50G) #712 has saving of merged weights using `safetensors`, 13B loads much faster

Fix llama 13B weights loading

> Does the 13B model work now? no, weights concatenation code is broken

Use safetensors in llama

seems mainly useful for saving concated weights, also loads a little bit faster on cpu (i added hack for numpy). If you think it's not needed in the example, feel...

Use safetensors in llama

> Re: 13B, I assume you save and load, right? You can do that without safetensors. yep, saving first. You mean can be saved with pickle? > I expect pickled...

Use safetensors in llama

it's implemented as pads and add. is there a way to fix without adding a special op? 🤔

Use safetensors in llama

it was `CPU=1` `CLANG=1` is 51s to concat `TORCH=1` on cpu is 8s

GPU support Table & VRAM usage

3B f16 runs on 2080ti. Though you might need a lot of RAM to convert f32 to f16, peak is like 24G calculation of lower bound of VRAM in GiB:...

Ok, the size seems about right then. ```python # took the size from disk. huggingface shows in / 1000**3 >>> (10_161_140_290+4_656_666_941) / 1024 / 1024 / 1024 13.800158380530775 >>> (3_638_525_952...

examples/stable_diffusion exception on first run

what OS? Also check shasum just in case ``` $ shasum weights/sd-v1-4.ckpt 210783247af4f65a3d23d026490cc37a670964dd weights/sd-v1-4.ckpt ```