Atharva Vaidya
Atharva Vaidya
It's happening after commit fe17340  I believe with optimized mode the model isn't getting transferred to the GPU at all now, hence the multiple devices error
@oobabooga are you using the optimised switch?
RTX 2060 Mobile 6GB
> If you do not use the option > > > --no-stream > > then > > > CUDA error: an illegal memory access was encountered I have the exact...
> I can run it with cpu, but still get error with gpu `python server.py --listen --model llama-7b --load-in-8bit --lora alpaca-lora-7b Hi, did you find any solution for this? I'm...
> Fwiw, there's already a working implementation in the v21 branch my dream booth extension. It should get merged into main today. I just tried it, but I can't seem...
> Also disabling '-opt-channelslast' reduced frequency of OOM for me in addition to the above. I think also switching from xformers to flash_attention might save a bit more ram also...
Thanks a lot for this! A note for anyone who gets `NameError: name 'BitsAndBytesConfig' is not defined`, use the second method, i.e. add `params.extend(["load_in_8bit=True", "llm_int8_enable_fp32_cpu_offload=True"]` below the pre-existing code (instead...
> If you're willing to manually retype the conversation history, then you can get your question answered, like so: Thanks! I guess that'll do for now. Hoping that it is...