ppppppppig issues

Results 3 issues of


                                            ppppppppig

请问chatglm6b，glm10b和glm130b模型到底有哪里不同的

### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 通过查看glm相关论文，我总结出了glm和glm130b的区别： | 模型名 | PE | 归一化 | | ------------- |-------------|...

when fastertransformer support continuous batching and PagedAttention ?

From this [article,](https://www.anyscale.com/blog/continuous-batching-llm-inference) I learned that continuous batching and PagedAttention greatly improve the inference performance of large models. I would like to know if fastertransformer has plans to support these...

When hot-loading a large model, a segmentation fault will occur.

### Description ```shell I start triton server with '--model-control-mode poll'. Segmentation fault occurs when modifying the model directory. ``` ### Reproduced Steps ```shell 1.CUDA_VISIBLE_DEVICES=3,4,5,6 /opt/tritonserver/bin/tritonserver --model-repository=/ft_workspace/all_models/t5/ --http-port 8008 --model-control-mode poll...

bug