lightllm [BUG] cannot launch server after builting from source

i strictly follow the installation docs (https://lightllm-cn.readthedocs.io/en/latest/getting_started/installation.html#installation). and my gpu is a800.

error: python -m lightllm.server.api_server --model_dir ~/autodl-pub/models/llama-7b/

INFO 12-24 20:14:05 [cache_tensor_manager.py:17] USE_GPU_TENSOR_CACHE is On ERROR 12-24 20:14:05 [_custom_ops.py:51] vllm or lightllm_kernel is not installed, you can't use custom ops INFO 12-24 20:14:05 [communication_op.py:41] vllm or lightllm_kernel is not installed, you can't use custom allreduce /root/autodl-tmp/lightllm/lightllm/server/api_server.py:356: DeprecationWarning: on_event is deprecated, use lifespan event handlers instead.

    Read more about it in the
    [FastAPI docs for Lifespan Events](https://fastapi.tiangolo.com/advanced/events/).

@app.on_event("shutdown") /root/autodl-tmp/lightllm/lightllm/server/api_server.py:375: DeprecationWarning: on_event is deprecated, use lifespan event handlers instead.

    Read more about it in the
    [FastAPI docs for Lifespan Events](https://fastapi.tiangolo.com/advanced/events/).

@app.on_event("startup") WARNING 12-24 20:14:06 [tokenizer.py:66] load fast tokenizer fail: Descriptors cannot not be created directly. WARNING 12-24 20:14:06 [tokenizer.py:66] If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. WARNING 12-24 20:14:06 [tokenizer.py:66] If you cannot immediately regenerate your protos, some other possible workarounds are: WARNING 12-24 20:14:06 [tokenizer.py:66] 1. Downgrade the protobuf package to 3.20.x or lower. WARNING 12-24 20:14:06 [tokenizer.py:66] 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower). WARNING 12-24 20:14:06 [tokenizer.py:66] WARNING 12-24 20:14:06 [tokenizer.py:66] More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates Traceback (most recent call last): File "/root/autodl-tmp/lightllm/lightllm/server/tokenizer.py", line 62, in get_tokenizer tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=trust_remote_code, *args, **kwargs) File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 907, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2208, in from_pretrained return cls._from_pretrained( File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2442, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 171, in init self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False)) File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 203, in get_spm_processor model_pb2 = import_protobuf(f"The new behaviour of {self.class.name} (with self.legacy = False)") File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/transformers/convert_slow_tokenizer.py", line 38, in import_protobuf from sentencepiece import sentencepiece_model_pb2 File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/sentencepiece/sentencepiece_model_pb2.py", line 34, in _descriptor.EnumValueDescriptor( File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 796, in new _message.Message._CheckCalledFromGeneratedFile() TypeError: Descriptors cannot not be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are:

Downgrade the protobuf package to 3.20.x or lower.
Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/miniconda3/envs/lightllm/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/miniconda3/envs/lightllm/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/autodl-tmp/lightllm/lightllm/server/api_server.py", line 394, in init_tokenizer(args) # for openai api File "/root/autodl-tmp/lightllm/lightllm/server/build_prompt.py", line 8, in init_tokenizer tokenizer = get_tokenizer(args.model_dir, args.tokenizer_mode, trust_remote_code=args.trust_remote_code) File "/root/autodl-tmp/lightllm/lightllm/server/tokenizer.py", line 68, in get_tokenizer tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=trust_remote_code, *args, **kwargs) File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 907, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2208, in from_pretrained return cls._from_pretrained( File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2442, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 171, in init self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False)) File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 203, in get_spm_processor model_pb2 = import_protobuf(f"The new behaviour of {self.class.name} (with self.legacy = False)") File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/transformers/convert_slow_tokenizer.py", line 38, in import_protobuf from sentencepiece import sentencepiece_model_pb2 File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/sentencepiece/sentencepiece_model_pb2.py", line 16, in DESCRIPTOR = _descriptor.FileDescriptor( File "/root/miniconda3/envs/lightllm/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 1066, in new return _message.default_pool.AddSerializedFile(serialized_pb) TypeError: Couldn't build proto file into descriptor pool: duplicate file name sentencepiece_model.proto

Dec 24 '24 12:12 MisterBrookT

so i wonder whether using container could bring less bug....

Dec 24 '24 12:12 MisterBrookT

downgrade protobuf package will fix this error (like below), also make sure you are using the huggingface format models:

pip install protobuf==3.20.3

Jan 03 '25 11:01 WuSiYu

downgrade protobuf package will fix this error (like below), also make sure you are using the huggingface format models:Downgrade Protobuf 包将修复此错误（如下所示），还要确保您使用的是 Huggingface 格式模型：
pip install protobuf==3.20.3

After downgrading, Llama's output has become very strange: Hinweis: Der Artikel ist in englischer Sprache verfasst. Why is this happening?

Oct 23 '25 09:10 Rayfxl