fastertransformer_backend
fastertransformer_backend copied to clipboard
T5 cross_attention output cannot be accessed
Description
As defined in the fastertransformers T5 guide there is an output value for cross_attentions
. I cannot find any way of returning cross_attentions
on fastertransformers Triton backend for T5.
For reference:
- Output of T5 Decoding
Name | Tensor/Parameter Shape | Location | Data Type | Description |
---|---|---|---|---|
output_ids | [batch_size, beam_width, max_output_seq_len] | GPU | int | The output ids. It contains the input_ids and generated ids |
sequence_length | [batch_size, beam_width] | GPU | int | The lengths of output ids |
output_log_probs | [batch_size, beam_width, request_output_seq_len] | GPU | float | Optional. It records the log probability of logits at each step for sampling. |
cum_log_probs | [batch_size, beam_width] | GPU | float | Optional. Cumulative log probability of generated sentences |
cross_attentions | [num_layer / pipeline_para_size, batch_size, beam_width, head_num / tensor_para_size, max_seq_len, mem_max_seq_len] | GPU | float | Optional. The attention scores of cross attention |
The API is not exposed in triton backend yet, that's why you cannot find the output in the t5 document of triton backend.