Load DialogGen in 4bit to make usage on 24gb consumer GPUs possible.

Open Meatfucker opened this issue 1 year ago • 0 comments

This alters the DialogGen loading to use bitsandbytes 4bit quantization. This reduces overall memory usage and makes inference possible on 24gb consumer GPUs with DialogGen enabled.

May 14 '24 23:05 Meatfucker