Silver267 issues

Results 4 issues of


                                            Silver267

Implement ZeRO inference

[Some information](https://www.deepspeed.ai/2022/09/09/zero-inference.html) Seems like ZeRO inference could improve the performance of offloading to RAM/NVMe. I don't know if huggingface's accelerate is already using it, but if not, it would be...

enhancement

Break models into pieces / safetensors support / keep session state

- [ ] Implement a mechanic to sync conversation and panel state between all devices connected to the gradio link. - [x] Add option (or another python file) to convert...

enhancement

4bit offload to ram

Is it theoretically possible to send pre-quantized 4bit llama layers to RAM to reduce ram usage & improve i/o performance? Currently, offloading a 33b model to ram will require 64gb+...

Option to inject current system time into context

**Description** Idea obtained from #403. In chat mode, injecting system time before each of bot's reply (and the context) could potentially enable LLMs to gain time awareness, thus perform better...

enhancement