qlora How to implement normal float NF4?

Hi, in uniform quantization we can do xq = [x/s] + offset and \hat{xq} = (x - offset) * s. However, in NF4 quantization, we need to find the nearest quantization outputs of x. I am wondering how to implement it efficiently?

Nov 19 '24 07:11 XA23i

NF4 works a bit differently from uniform quantization since it uses a fixed non-uniform codebook rather than evenly spaced steps. A simple way to implement it is to precompute the midpoints between adjacent codebook entries and then assign each value to the nearest bin. This avoids looping through all 16 codebook values and makes the implementation efficient.

Below is a minimal example:

import numpy as np

# NF4 codebook (values used in QLoRA)
CODEBOOK = np.array([
    -1.0000, -0.6962, -0.5250, -0.3940,
    -0.2840, -0.1800, -0.0760,  0.0000,
     0.0760,  0.1800,  0.2840,  0.3940,
     0.5250,  0.6962,  1.0000
], dtype=np.float32)

THRESHOLDS = (CODEBOOK[:-1] + CODEBOOK[1:]) / 2

def nf4_quantize(x: np.ndarray):
    """
    Quantize input array x to NF4 values.
    Returns the quantized values and their indices in the codebook.
    """
    idx = np.digitize(x, THRESHOLDS)   # find nearest bin
    return CODEBOOK[idx], idx

x = np.array([-0.7, -0.2, 0.15, 0.9], dtype=np.float32)
qvals, qidx = nf4_quantize(x)

print("Input:", x)
print("Quantized:", qvals)
print("Indices:", qidx)

Sep 26 '25 07:09 kudos07

Is the code for computing CODEBOOK available? For example, if one wanted to create a 3-bit codebook or a 6-bit codebook, how would one do it? I tried creating the 4-bit codebook from the paper and got different values from the CODEBOOK above, so I am wondering what I did wrong.

Oct 20 '25 15:10 eak

I found create_normal_map in https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py answered my question.

Oct 21 '25 20:10 eak