jinluyang

Results 10 comments of jinluyang

and it seems some cmake files lack -lmpi_cxx, so there are some compile errors

> We will solve the issue 2 and 3 first. > And for issue 1, we will take some time to check. > Do you encounter any problem due to...

> > > We will solve the issue 2 and 3 first. > > > And for issue 1, we will take some time to check. > > > Do...

> It should work because the entire method is wrapped in tf.function. There is a test case in > > https://github.com/onnx/onnx-tensorflow/blob/0874ca1378a8fe2d06e66f23323c3f828652e900/test/backend/test_dynamic_shape.py#L336 > > . It will produce indices_shape as a...

I got the same error on centos6.5 python2.7

I guess maybe because it's pipeline parallelism.. Sorry I haven't tried it really, but I think multi-GPU like this would require us to process data asyncronously. That is for throughput....

forgive my boldness. Is it possible to just change shared.history to something like shared.history[user_id] to support this? So that the historys are seperate.

would be glad to help do a part of the work, for example converting the weights to FT

@cameronfr I think the reshape of qkv here might not be correct https://github.com/cameronfr/FasterTransformer/blob/45d48f9d06713cd006f7d95d4b2f99a4bd3abb11/examples/cpp/llama/huggingface_llama_convert.py#L97 Since the huggingface format qkv proj is prepared for rotary embedding https://github.com/huggingface/transformers/blob/d04ec99bec8a0b432fc03ed60cea9a1a20ebaf3c/src/transformers/models/llama/convert_llama_weights_to_hf.py#L101 So I tried something like...

Same problem when runnning qwen-14B, using float16, and my sentence output is different from pure pytorch. Using trtllm-0.8.0 @byshiue