lightllm [BUG]baichuan-13b-error

1, python -m lightllm.server.api_server --model_dir baichuan-13b --host 0.0.0.0 --port 8080 --tp 1 --max_total_token_num 4096 --trust_remote_code

success and can see log :

INFO: Started server process [560] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8080 (

2, curl http://127.0.0.1:8080/generate -X POST -d '{"inputs":"What is AI?","parameters":{"max_new_tokens":17, "frequency_penalty":1}}' -H 'Content-Type: application/json'

3, the server error and pending, the client also pending . python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.

Sep 28 '23 03:09 kuangdao

The same problem occurs when using the chatglm2-6b model

Sep 30 '23 12:09 chatllm

@kuangdao @chatllm

The code has been tested on a range of GPUs including A100, A800, 4090, and H800. If you are running the code on A100, A800, etc., we recommend using triton==2.0.0.dev20221202 or triton==2.1.0. If you are running the code on H800, etc., it is necessary to compile and install the source code of [triton==2.1.0](https://github.com/openai/triton/tree/main) from the GitHub repository. If the code doesn't work on other GPUs, try modifying the triton kernel used in model inference.

Install Triton Package
use triton==2.0.0.dev20221202

pip install triton==2.0.0.dev20221202
use triton==2.1.0 (Better performance, but the code is under continuous development and may be unstable.)

pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly

Oct 07 '23 07:10 hiworldwzj