TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

FP8Linear.forward cannot be called twice

Open thefacetakt opened this issue 1 year ago • 0 comments

TLDR: When trying to forward FP8Linear layer twice, and error occurs

   ...
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in __call__
    output = self.forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/layers.py", line 892, in forward
    alpha = self.weights_scaling_factor.raw_value * self.activation_scaling_factor.raw_value
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/parameter.py", line 120, in raw_value
    assert isinstance(
AssertionError: Must be np.ndarray. Proper usage: get parameter.raw_value before getting parameter.value

Underlying reason is a .raw_value acces:

First, there is a call accessing raw_value of scaling factor parameters

alpha = self.weights_scaling_factor.raw_value * self.activation_scaling_factor.raw_value

And after it there are calls 1 2 to .value of these parameters.

And a call to .value rewrites parameter's ._value with a constant, which prohibits further use of .raw_value.

Hence, it is not possible to call FPLinear forward twice. Seems like a bug.

(It is currently needed, for example, in cross-Attention layer, where first we call self.qkv(hidden_states), and then self.qkv(encoder_output))

thefacetakt avatar Jul 26 '24 13:07 thefacetakt