fastertransformer_backend Some questions

Some questions

Open LiuChen19960902 opened this issue 1 year ago • 10 comments

How should I use FasterTransformer Triton to deploy my custom model, such as adding other structures after BERT? Assuming my model structure is defined like this:


class HfClassModel():
    def __init__(self, config):
        super(HfClassModel, self).__init__(config)
        self.bert = BertModel(config)
        self.multi_head_attention = BertSelfAttention(ma_config)
        self.start_project = nn.Linear(in_features=ma_config.hidden_size, out_features=1)
        self.end_project = nn.Linear(in_features=ma_config.hidden_size, out_features=1)

    def forward(self, input_ids, input_type_ids, input_mask,
                standard_input_ids, standard_type_ids, standard_input_mask):
        mix_input_ids = torch.cat([input_ids, standard_input_ids], 0)
        mix_input_mask = torch.cat([input_mask, standard_input_mask], 0)
        mix_input_type_ids = torch.cat([input_type_ids, standard_type_ids], 0)

        bert = self.bert(input_ids=mix_input_ids, attention_mask=mix_input_mask, token_type_ids=mix_input_type_ids)
        last_hidden_state, pooler_output = bert[0], bert[1]

        state_chunk = torch.chunk(last_hidden_state, 2, dim=0)
        user_hidden_state, standard_hidden_state = state_chunk
        mix_hidden_state = self.multi_head_attention(hidden_states=user_hidden_state_coatt[-1],
                                                     encoder_hidden_states=standard_hidden_state_coatt[-1])[0]

        start_logits = self.start_project(mix_hidden_state)
        end_logits = self.end_project(mix_hidden_state)
        start_logits = start_logits.squeeze(-1)
        end_logits = end_logits.squeeze(-1)
        return start_logits, end_logits

At this point, I have some questions:

model checkpoint convert. I think I should convert both the BERT part and the custom multi-head attention part.
config.pbtxt. Should the input and output be defined like this in the config.pbtxt?

input [
  {
    name: "input_ids"
    data_type: TYPE_UINT32
    dims: [ -1 ]
  },
  {
    name: "segment_ids"
    data_type: TYPE_UINT32
    dims: [ -1 ]
  },
  {
    name: "input_mask"
    data_type: TYPE_UINT32
    dims: [ -1 ]
  },
  {
    name: "std_input_ids"
    data_type: TYPE_UINT32
    dims: [ -1 ]
  },
  {
    name: "std_segment_ids"
    data_type: TYPE_UINT32
    dims: [ -1 ]
  },
  {
    name: "std_input_mask"
    data_type: TYPE_UINT32
    dims: [ -1 ]
  }
]
output [
  {
    name: "start_logits"
    data_type: TYPE_FP16
    dims: [ -1 ]
  },
  {
    name: "end_logits"
    data_type: TYPE_FP16
    dims: [ -1 ]
  }
]

Apr 13 '23 09:04 LiuChen19960902

If you change the model architecture, you should modify the FT source codes first.

Apr 13 '23 09:04 byshiue

If you change the model architecture, you should modify the FT source codes first.

Can you please provide me with some references? I would greatly appreciate it.

Apr 13 '23 09:04 LiuChen19960902

You need to check what files do you need to modify, like https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/models/bert/Bert.cc and other related files.

Apr 13 '23 09:04 byshiue

You need to check what files do you need to modify, like https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/models/bert/Bert.cc and other related files.

Thank you very much for your advice, I will go and study it.

Apr 13 '23 09:04 LiuChen19960902

你的问题解决了吗？我也碰到了该问题

Apr 14 '23 09:04 18810251126

你的问题解决了吗？我也碰到了该问题

emm,没有，在研究源码中....

Apr 14 '23 09:04 LiuChen19960902

感觉应该有适配的，不然实际场景绝大多数模型都是不可用的呀

Apr 14 '23 09:04 18810251126

感觉应该有适配的，不然实际场景绝大多数模型都是不可用的呀

目前我还没找到，毕竟现在用LLM的话，感觉基本上也不会自定义结构。

Apr 14 '23 09:04 LiuChen19960902

demo给的好像只可以输出output_hidden_state这个特征向量。如果是用bert做简单的二分类，这个你知道怎么做么

Apr 14 '23 09:04 18810251126

demo给的好像只可以输出output_hidden_state这个特征向量。如果是用bert做简单的二分类，这个你知道怎么做么

暂时还不知道

Apr 14 '23 09:04 LiuChen19960902

fastertransformer_backend fastertransformer_backend copied to clipboard

Some questions

fastertransformer_backend
fastertransformer_backend copied to clipboard