Atharva Vaidya

Results 9 comments of Atharva Vaidya

It's happening after commit fe17340 ![image](https://user-images.githubusercontent.com/93036639/187471246-fbc2c186-32c4-4d4e-a820-640fcc474323.png) I believe with optimized mode the model isn't getting transferred to the GPU at all now, hence the multiple devices error

@oobabooga are you using the optimised switch?

> If you do not use the option > > > --no-stream > > then > > > CUDA error: an illegal memory access was encountered I have the exact...

> I can run it with cpu, but still get error with gpu `python server.py --listen --model llama-7b --load-in-8bit --lora alpaca-lora-7b Hi, did you find any solution for this? I'm...

> Fwiw, there's already a working implementation in the v21 branch my dream booth extension. It should get merged into main today. I just tried it, but I can't seem...

> Also disabling '-opt-channelslast' reduced frequency of OOM for me in addition to the above. I think also switching from xformers to flash_attention might save a bit more ram also...

Thanks a lot for this! A note for anyone who gets `NameError: name 'BitsAndBytesConfig' is not defined`, use the second method, i.e. add `params.extend(["load_in_8bit=True", "llm_int8_enable_fp32_cpu_offload=True"]` below the pre-existing code (instead...

> If you're willing to manually retype the conversation history, then you can get your question answered, like so: Thanks! I guess that'll do for now. Hoping that it is...