fastertransformer_backend icon indicating copy to clipboard operation
fastertransformer_backend copied to clipboard

T5 cross_attention output cannot be accessed

Open JustinAWei opened this issue 2 years ago • 1 comments

Description

As defined in the fastertransformers T5 guide there is an output value for cross_attentions. I cannot find any way of returning cross_attentions on fastertransformers Triton backend for T5.

For reference:

  • Output of T5 Decoding
Name Tensor/Parameter Shape Location Data Type Description
output_ids [batch_size, beam_width, max_output_seq_len] GPU int The output ids. It contains the input_ids and generated ids
sequence_length [batch_size, beam_width] GPU int The lengths of output ids
output_log_probs [batch_size, beam_width, request_output_seq_len] GPU float Optional. It records the log probability of logits at each step for sampling.
cum_log_probs [batch_size, beam_width] GPU float Optional. Cumulative log probability of generated sentences
cross_attentions [num_layer / pipeline_para_size, batch_size, beam_width, head_num / tensor_para_size, max_seq_len, mem_max_seq_len] GPU float Optional. The attention scores of cross attention

JustinAWei avatar Jan 07 '23 02:01 JustinAWei

The API is not exposed in triton backend yet, that's why you cannot find the output in the t5 document of triton backend.

byshiue avatar Jan 09 '23 00:01 byshiue