transformers-bloom-inference Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB?

Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB?

Open dizhenx opened this issue 1 year ago • 1 comments

Where can I download bloom-7b? I noticed that int8 quantization is available, but is there an option for int4 quantization? What is the memory overhead for int4 and int8 when using LoRA or PTuning fine-tuning? Are there any fine-tuning scripts available? Additionally, are there inference scripts available for int4 quantization? How much GPU memory is required for int4 and int8 inference, respectively?

May 31 '23 05:05 dizhenx

This is not possible. But you might want to take a look at QLoRA paper: https://github.com/artidoro/qlora

May 31 '23 22:05 mayank31398

transformers-bloom-inference transformers-bloom-inference copied to clipboard

Are there fine-tuning and inference scripts available for int4 quantization in bloom-7b? Is it possible to limit the GPU memory usage to within 10GB?

transformers-bloom-inference
transformers-bloom-inference copied to clipboard