jinluyang comments

Results 12 comments of


                                            jinluyang

maybe some advices for cmake

and it seems some cmake files lack -lmpi_cxx, so there are some compile errors

maybe some advices for cmake

> We will solve the issue 2 and 3 first. > And for issue 1, we will take some time to check. > Do you encounter any problem due to...

maybe some advices for cmake

> > > We will solve the issue 2 and 3 first. > > > And for issue 1, we will take some time to check. > > > Do...

When indices_shape is a tensor, cannot index it

> It should work because the entire method is wrapped in tf.function. There is a test case in > > https://github.com/onnx/onnx-tensorflow/blob/0874ca1378a8fe2d06e66f23323c3f828652e900/test/backend/test_dynamic_shape.py#L336 > > . It will produce indices_shape as a...

build with python error

I got the same error on centos6.5 python2.7

why multi GPUs(8V100) need more time to infer compared to Single GPU（1V100）

I guess maybe because it's pipeline parallelism.. Sorry I haven't tried it really, but I think multi-GPU like this would require us to process data asyncronously. That is for throughput....

one server, multiple sessions (users) ((feature request))

forgive my boldness. Is it possible to just change shared.history to something like shared.history[user_id] to support this? So that the historys are seperate.

LLaMA support

would be glad to help do a part of the work, for example converting the weights to FT

LLaMA support

@cameronfr I think the reshape of qkv here might not be correct https://github.com/cameronfr/FasterTransformer/blob/45d48f9d06713cd006f7d95d4b2f99a4bd3abb11/examples/cpp/llama/huggingface_llama_convert.py#L97 Since the huggingface format qkv proj is prepared for rotary embedding https://github.com/huggingface/transformers/blob/d04ec99bec8a0b432fc03ed60cea9a1a20ebaf3c/src/transformers/models/llama/convert_llama_weights_to_hf.py#L101 So I tried something like...

Detected layernorm nodes in FP16.

Same problem when runnning qwen-14B, using float16, and my sentence output is different from pure pytorch. Using trtllm-0.8.0 @byshiue