GPTQ-for-LLaMa running on old gpu with fp32 only

I'm using old p40 , which seems not supporting fp16

I tried to latest triton branch, and compile triton from master.

the inference code shows something like

error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
error: 'llvm.intr.fmuladd' op requires the same type for all operands and results
Pass execution failedLLVM ERROR: Failed to translate TritonGPU to LLVM IR.

I tried to replace all float16 to float32, and loads the model but

triton.compiler.errors.CompilationError: at 58:33:
        zeros = (zeros >> zeros_shifter[None, :]) & maxq
        zeros = (zeros + 1)

        a = tl.load(a_ptrs, mask=a_mask, other=0.)  # (BLOCK_SIZE_M, BLOCK_SIZE_K)
        b = tl.load(b_ptrs)  # (BLOCK_SIZE_K, BLOCK_SIZE_N), but repeated

        # Now we need to unpack b (which is N-bit values) into 32-bit values
        b = (b >> shifter[:, None]) & maxq  # Extract the N-bit values
        b = (b - zeros) * scales  # Scale and shift

        accumulator += tl.dot(a, b)
                                 ^
AssertionError('lhs and rhs must have the same dtype!')

any idea about how to fix it?

found there are several branch here. triton: raising the above problem cuda: works but quite slow old-cuda: works, however still slow and gave weird result

May 11 '23 08:05 DeoLeung

Triton won't support us. They "fixed" it by adding some warnings and asserts. It is not this repo's fault.

The ooba branch/autogptq/my fork work for fast inference. The "faster" kernel that uses FP16 has to be turned off. Pascal performance is FP16 is 1/2 speed and in the 3090 they are equal speed.

May 13 '23 12:05 Ph0rk0z

Unrelated to the card, however, is old-cuda still seen as fastest? I'm running a 1080ti....but I doubt that matters in this case.

May 22 '23 15:05 ilikenwf

Old cuda with faster kernel disabled is the way to go.

May 23 '23 11:05 Ph0rk0z

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

running on old gpu with fp32 only

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard