transformers-bloom-inference
transformers-bloom-inference copied to clipboard
Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB?
Where can I download bloom-7b? I noticed that int8 quantization is available, but is there an option for int4 quantization? What is the memory overhead for int4 and int8 when using LoRA or PTuning fine-tuning? Are there any fine-tuning scripts available? Additionally, are there inference scripts available for int4 quantization? How much GPU memory is required for int4 and int8 inference, respectively?
This is not possible. But you might want to take a look at QLoRA paper: https://github.com/artidoro/qlora