shixianc comments

Results 10 comments of


                                            shixianc

Questions related to TRT conversion and TRT-LLM support

@ptarasiewiczNV Thank you for the reply. Regarding 1 it would be nice to have that as some of our models are small enough (can be loaded on a 16GB GPU)...

Allow for explicit folder name when specifying where remote model repository will be downloaded.

Hi is there any update on this feature? This is quite useful for loading large LLM from s3.

CTranslate2

+11

Generation with Prefix-cache are slower than the ones without it ?

The automatic prefix caching commit seems merged very recent and labeled as 0.3.4 release. So I assume some changes are not available on 0.3.3 Update: actually I just tested that...

Generation with Prefix-cache are slower than the ones without it ?

@robertgshaw2-neuralmagic thanks, we're really looking forward for the optimization! Also, could you clarify on the behavior of this feature: 1. in the same batch, first N tokens of the requests...

Support Multiple Models

[FEATURE] Implement Dynamic SplitFuse

Do we have an ETA? 😊

OutOfMemoryError

Is anyone able to run it on 4 A10 GPUs? 4*24GB=96GB

enc-dec triton backend support

Hi is there an update for this?

enc-dec triton backend support

@symphonylyh Thanks for the update! Starting with (3) would unblock our team. May I assume this would also have the classic dynamic batching supported?