Ray Huang

Results 6 comments of Ray Huang

BTW, my model is BERT, any hints?

> @rahuan can you try to run it with the latest triton servers (rebuild the image if you are not using the latest one) ? and enable the verbose logging...

I just synced latest code of fastertransformer_backend now, it fails even faster at a very low qps. below are errors: I1212 06:38:03.990948 1 libfastertransformer.cc:1022] get total batch_size = 1 I1212...

> what seq length you are using ? Batch_size is 10 or 20, seq length is different for each sentence in a batch, average is about 50~60, but the max...

The model settings are the same as bert-base chinese, layer num is 12, head num is 12, hidden size is 768 = 64*12, thanks! BTW, the data_type is fp16, is_remove_padding...

@PerkzZheng, may I ask if any findings about this issue?