fastertransformer_backend
fastertransformer_backend copied to clipboard
Some questions
How should I use FasterTransformer Triton to deploy my custom model, such as adding other structures after BERT? Assuming my model structure is defined like this:
class HfClassModel():
def __init__(self, config):
super(HfClassModel, self).__init__(config)
self.bert = BertModel(config)
self.multi_head_attention = BertSelfAttention(ma_config)
self.start_project = nn.Linear(in_features=ma_config.hidden_size, out_features=1)
self.end_project = nn.Linear(in_features=ma_config.hidden_size, out_features=1)
def forward(self, input_ids, input_type_ids, input_mask,
standard_input_ids, standard_type_ids, standard_input_mask):
mix_input_ids = torch.cat([input_ids, standard_input_ids], 0)
mix_input_mask = torch.cat([input_mask, standard_input_mask], 0)
mix_input_type_ids = torch.cat([input_type_ids, standard_type_ids], 0)
bert = self.bert(input_ids=mix_input_ids, attention_mask=mix_input_mask, token_type_ids=mix_input_type_ids)
last_hidden_state, pooler_output = bert[0], bert[1]
state_chunk = torch.chunk(last_hidden_state, 2, dim=0)
user_hidden_state, standard_hidden_state = state_chunk
mix_hidden_state = self.multi_head_attention(hidden_states=user_hidden_state_coatt[-1],
encoder_hidden_states=standard_hidden_state_coatt[-1])[0]
start_logits = self.start_project(mix_hidden_state)
end_logits = self.end_project(mix_hidden_state)
start_logits = start_logits.squeeze(-1)
end_logits = end_logits.squeeze(-1)
return start_logits, end_logits
At this point, I have some questions:
- model checkpoint convert. I think I should convert both the BERT part and the custom multi-head attention part.
- config.pbtxt. Should the input and output be defined like this in the config.pbtxt?
input [
{
name: "input_ids"
data_type: TYPE_UINT32
dims: [ -1 ]
},
{
name: "segment_ids"
data_type: TYPE_UINT32
dims: [ -1 ]
},
{
name: "input_mask"
data_type: TYPE_UINT32
dims: [ -1 ]
},
{
name: "std_input_ids"
data_type: TYPE_UINT32
dims: [ -1 ]
},
{
name: "std_segment_ids"
data_type: TYPE_UINT32
dims: [ -1 ]
},
{
name: "std_input_mask"
data_type: TYPE_UINT32
dims: [ -1 ]
}
]
output [
{
name: "start_logits"
data_type: TYPE_FP16
dims: [ -1 ]
},
{
name: "end_logits"
data_type: TYPE_FP16
dims: [ -1 ]
}
]
If you change the model architecture, you should modify the FT source codes first.
If you change the model architecture, you should modify the FT source codes first.
Can you please provide me with some references? I would greatly appreciate it.
You need to check what files do you need to modify, like https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/models/bert/Bert.cc and other related files.
You need to check what files do you need to modify, like https://github.com/NVIDIA/FasterTransformer/blob/main/src/fastertransformer/models/bert/Bert.cc and other related files.
Thank you very much for your advice, I will go and study it.
你的问题解决了吗? 我也碰到了该问题
你的问题解决了吗? 我也碰到了该问题
emm,没有,在研究源码中....
感觉应该有适配的,不然实际场景绝大多数模型都是不可用的呀
感觉应该有适配的,不然实际场景绝大多数模型都是不可用的呀
目前我还没找到,毕竟现在用LLM的话,感觉基本上也不会自定义结构。
demo给的好像只可以输出output_hidden_state这个特征向量。 如果是用bert做简单的二分类,这个你知道怎么做么
demo给的好像只可以输出output_hidden_state这个特征向量。 如果是用bert做简单的二分类,这个你知道怎么做么
暂时还不知道