0xd8b comments

Repositories
Issues
Comments

Results 13 comments of


                                            0xd8b

T5 model, large difference in results when `remove_input_padding` is enabled

After the transformation of the T5 model is complete, with "remove input padding" set to false and a maximum batch size of 8, when the model inference is set to...

Enc-Dec C++ Runtime Paged KV - Inflight Batching output junks while inference with multiple input texts

> @thanhlt998 fixed. It was due to missing cuda stream synchronization between encoder stream and decoder stream. The fix will be released in next week's weekly main branch update For...

batch inference is different with single

@QiJune I encountered the same issue with the T5 model (float16). The inference results vary slightly with different batch sizes during extensive sample testing. Is this a normal phenomenon? I...