Seung Ho Jang comments

Repositories
Issues
Comments

Results 3 comments of


                                            Seung Ho Jang

Stop the generation if the eod is reached

Hello, I wonder if #584 also applies to GPT-J? I am testing inferences with tritonserver's fastertransformer backend and GPT-J converted model, and it takes as much as time in proportion...

Memory usage is doubled when loading a fp16 model into bf16

@devin12422 No, since FasterTransformer is deprecated and TensorRT-LLM succeeded it, just used tensorrtllm_backend and it seemed to work fine.

[Bug] Cannot compile custom wasm model file to work on web browser

@CharlieFRuan That's great, thanks.