GPTQ-for-LLaMa High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear.

High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear.

Open hyx1999 opened this issue 2 years ago • 1 comments

I tried to test GPTQ's PPL metrics on the opt model via opt.py. The PPL metrics of the opt model are normal with the use of fake quantization. However, when I try to place the opt_pack before the opt_eval and set the groupsize to a value other than -1 (e.g. 128), the PPL metric of the quantized model will be much larger than that of the fake quantized model. And when groupsize is set to -1 everything is fine.

wbits=4, groupsize=128, without opt_pack wikitext2 Evaluating ... 0 1 2 3 4 5 6 7 8 9 10 11 28.715469360351562

wbits=4, groupsize=128, with opt_pack wikitext2 Evaluating ... 0 1 2 3 4 5 6 7 8 9 10 11 778.898193359375

    # opt_pack before opt_eval 
    if not args.load and args.wbits < 16 and not args.nearest:
        model = opt_pack(model, quantizers, args.wbits, args.groupsize)
    
    print("model:", "\n", model)

    if args.eval:
        datasets = ['wikitext2']
        if args.new_eval:
            datasets = ['wikitext2']
        for dataset in datasets:
            dataloader, testloader = get_loaders(dataset, seed=args.seed, model=args.model, seqlen=model.seqlen, cache_dir=args.cache_dir)
            print(dataset)
            opt_eval(model, testloader, DEV)

Jul 06 '23 13:07 hyx1999

I tried to test GPTQ's PPL metrics on the opt model via opt.py. The PPL metrics of the opt model are normal with the use of fake quantization. However, when I try to place the opt_pack before the opt_eval and set the groupsize to a value other than -1 (e.g. 128), the PPL metric of the quantized model will be much larger than that of the fake quantized model. And when groupsize is set to -1 everything is fine.

wbits=4, groupsize=128, without opt_pack wikitext2 Evaluating ... 0 1 2 3 4 5 6 7 8 9 10 11 28.715469360351562

wbits=4, groupsize=128, with opt_pack wikitext2 Evaluating ... 0 1 2 3 4 5 6 7 8 9 10 11 778.898193359375
    # opt_pack before opt_eval 
    if not args.load and args.wbits < 16 and not args.nearest:
        model = opt_pack(model, quantizers, args.wbits, args.groupsize)
    
    print("model:", "\n", model)

    if args.eval:
        datasets = ['wikitext2']
        if args.new_eval:
            datasets = ['wikitext2']
        for dataset in datasets:
            dataloader, testloader = get_loaders(dataset, seed=args.seed, model=args.model, seqlen=model.seqlen, cache_dir=args.cache_dir)
            print(dataset)
            opt_eval(model, testloader, DEV)

I completed the above test using Facebook/opt 125m

Jul 06 '23 13:07 hyx1999

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear.

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard