Silver267
Silver267
[Some information](https://www.deepspeed.ai/2022/09/09/zero-inference.html) Seems like ZeRO inference could improve the performance of offloading to RAM/NVMe. I don't know if huggingface's accelerate is already using it, but if not, it would be...
- [ ] Implement a mechanic to sync conversation and panel state between all devices connected to the gradio link. - [x] Add option (or another python file) to convert...
Is it theoretically possible to send pre-quantized 4bit llama layers to RAM to reduce ram usage & improve i/o performance? Currently, offloading a 33b model to ram will require 64gb+...
**Description** Idea obtained from #403. In chat mode, injecting system time before each of bot's reply (and the context) could potentially enable LLMs to gain time awareness, thus perform better...