[BUG]baichuan-13b-error
1, python -m lightllm.server.api_server --model_dir baichuan-13b --host 0.0.0.0 --port 8080 --tp 1 --max_total_token_num 4096 --trust_remote_code
success and can see log :
INFO: Started server process [560] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8080 (
2, curl http://127.0.0.1:8080/generate -X POST -d '{"inputs":"What is AI?","parameters":{"max_new_tokens":17, "frequency_penalty":1}}' -H 'Content-Type: application/json'
3, the server error and pending, the client also pending .
python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector
The same problem occurs when using the chatglm2-6b model
@kuangdao @chatllm
The code has been tested on a range of GPUs including A100, A800, 4090, and H800. If you are running the code on A100, A800, etc., we recommend using triton==2.0.0.dev20221202 or triton==2.1.0. If you are running the code on H800, etc., it is necessary to compile and install the source code of [triton==2.1.0](https://github.com/openai/triton/tree/main) from the GitHub repository. If the code doesn't work on other GPUs, try modifying the triton kernel used in model inference.
Install Triton Package
use triton==2.0.0.dev20221202
pip install triton==2.0.0.dev20221202
use triton==2.1.0 (Better performance, but the code is under continuous development and may be unstable.)
pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly