LongWriter
LongWriter copied to clipboard
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) when running vllm
System Info / 系統信息
CUDA Version: 12.2 transformers Version: 4.44.2 Python: 3.12.4 Operating system: Windows Subsystem for Linux (WSL) in VS Code
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
- [X] The official example scripts / 官方的示例脚本
- [ ] My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
- Create a new conda environment and install vLLM with Python 3.12
- Copy vllm_inference.py and run.
- Installed model through vllm.
- Error:
INFO 08-30 11:07:34 llm_engine.py:184] Initializing an LLM engine (v0.5.5) with config: model='THUDM/LongWriter-glm4-9b', speculative_config=None, tokenizer='THUDM/LongWriter-glm4-9b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=THUDM/LongWriter-glm4-9b, use_v2_block_manager=False, enable_prefix_caching=False)
WARNING 08-30 11:12:36 tokenizer.py:137] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
WARNING 08-30 11:12:36 utils.py:721] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
INFO 08-30 11:12:37 model_runner.py:879] Starting to load model THUDM/LongWriter-glm4-9b...
INFO 08-30 11:12:38 weight_utils.py:236] Using model weights format ['*.safetensors']
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/c/Users/Documents/longwriter/lw-vllm.py", line 2, in <module>
[rank0]: model = LLM(
[rank0]: ^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 175, in __init__
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 473, in from_engine_args
[rank0]: engine = cls(
[rank0]: ^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 270, in __init__
[rank0]: self.model_executor = executor_class(
[rank0]: ^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 46, in __init__
[rank0]: self._init_executor()
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/executor/gpu_executor.py", line 39, in _init_executor
[rank0]: self.driver_worker.load_model()
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/worker/worker.py", line 182, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/worker/model_runner.py", line 881, in load_model
[rank0]: self.model = get_model(model_config=self.model_config,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py", line 19, in get_model
[rank0]: return loader.load_model(model_config=model_config,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 345, in load_model
[rank0]: self._get_weights_iterator(model_config.model,
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 306, in _get_weights_iterator
[rank0]: hf_folder, hf_weights_files, use_safetensors = self._prepare_weights(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 289, in _prepare_weights
[rank0]: hf_weights_files = filter_duplicate_safetensors_files(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 301, in filter_duplicate_safetensors_files
[rank0]: weight_map = json.load(index_file)["weight_map"]
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/json/__init__.py", line 293, in load
[rank0]: return loads(fp.read(),
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/json/__init__.py", line 346, in loads
[rank0]: return _default_decoder.decode(s)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/json/decoder.py", line 337, in decode
[rank0]: obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/json/decoder.py", line 355, in raw_decode
[rank0]: raise JSONDecodeError("Expecting value", s, err.value) from None
[rank0]: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[rank0]:[W830 11:12:39.523642430 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
Expected behavior / 期待表现
Article generated, just like when running with HuggingFace transformer.
Hi, can you try updating your vllm version to 0.5.4?
Thanks for the reply, I have tried to update the vllm version by running
pip install vllm==0.5.4
And here's the environment packages, updated:
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
aiohappyeyeballs 2.4.0 pypi_0 pypi
aiohttp 3.10.5 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
annotated-types 0.7.0 pypi_0 pypi
anyio 4.4.0 pypi_0 pypi
attrs 24.2.0 pypi_0 pypi
audioread 3.0.1 pypi_0 pypi
bzip2 1.0.8 h5eee18b_6
ca-certificates 2024.7.2 h06a4308_0
certifi 2024.8.30 pypi_0 pypi
cffi 1.17.0 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
click 8.1.7 pypi_0 pypi
cloudpickle 3.0.0 pypi_0 pypi
cmake 3.30.2 pypi_0 pypi
datasets 2.21.0 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
dill 0.3.8 pypi_0 pypi
diskcache 5.6.3 pypi_0 pypi
distro 1.9.0 pypi_0 pypi
expat 2.6.2 h6a678d5_0
fastapi 0.112.2 pypi_0 pypi
filelock 3.15.4 pypi_0 pypi
frozenlist 1.4.1 pypi_0 pypi
fsspec 2024.6.1 pypi_0 pypi
gguf 0.9.1 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
httpcore 1.0.5 pypi_0 pypi
httptools 0.6.1 pypi_0 pypi
httpx 0.27.2 pypi_0 pypi
huggingface-hub 0.24.6 pypi_0 pypi
idna 3.8 pypi_0 pypi
importlib-metadata 8.4.0 pypi_0 pypi
interegular 0.3.3 pypi_0 pypi
jinja2 3.1.4 pypi_0 pypi
jiter 0.5.0 pypi_0 pypi
joblib 1.4.2 pypi_0 pypi
jsonschema 4.23.0 pypi_0 pypi
jsonschema-specifications 2023.12.1 pypi_0 pypi
lark 1.2.2 pypi_0 pypi
lazy-loader 0.4 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
librosa 0.10.2.post1 pypi_0 pypi
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
llvmlite 0.43.0 pypi_0 pypi
lm-format-enforcer 0.10.3 pypi_0 pypi
markupsafe 2.1.5 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
msgpack 1.0.8 pypi_0 pypi
msgspec 0.18.6 pypi_0 pypi
multidict 6.0.5 pypi_0 pypi
multiprocess 0.70.16 pypi_0 pypi
ncurses 6.4 h6a678d5_0
nest-asyncio 1.6.0 pypi_0 pypi
networkx 3.3 pypi_0 pypi
ninja 1.11.1.1 pypi_0 pypi
numba 0.60.0 pypi_0 pypi
numpy 1.26.4 pypi_0 pypi
nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
nvidia-ml-py 12.560.30 pypi_0 pypi
nvidia-nccl-cu12 2.20.5 pypi_0 pypi
nvidia-nvjitlink-cu12 12.6.68 pypi_0 pypi
nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
openai 1.43.0 pypi_0 pypi
openssl 3.0.14 h5eee18b_0
outlines 0.0.46 pypi_0 pypi
packaging 24.1 pypi_0 pypi
pandas 2.2.2 pypi_0 pypi
pillow 10.4.0 pypi_0 pypi
pip 24.2 py312h06a4308_0
platformdirs 4.2.2 pypi_0 pypi
pooch 1.8.2 pypi_0 pypi
prometheus-client 0.20.0 pypi_0 pypi
prometheus-fastapi-instrumentator 7.0.0 pypi_0 pypi
protobuf 5.28.0 pypi_0 pypi
psutil 6.0.0 pypi_0 pypi
py-cpuinfo 9.0.0 pypi_0 pypi
pyairports 2.1.1 pypi_0 pypi
pyarrow 17.0.0 pypi_0 pypi
pycountry 24.6.1 pypi_0 pypi
pycparser 2.22 pypi_0 pypi
pydantic 2.8.2 pypi_0 pypi
pydantic-core 2.20.1 pypi_0 pypi
python 3.12.4 h5148396_1
python-dateutil 2.9.0.post0 pypi_0 pypi
python-dotenv 1.0.1 pypi_0 pypi
pytz 2024.1 pypi_0 pypi
pyyaml 6.0.2 pypi_0 pypi
pyzmq 26.2.0 pypi_0 pypi
ray 2.35.0 pypi_0 pypi
readline 8.2 h5eee18b_0
referencing 0.35.1 pypi_0 pypi
regex 2024.7.24 pypi_0 pypi
requests 2.32.3 pypi_0 pypi
rpds-py 0.20.0 pypi_0 pypi
safetensors 0.4.4 pypi_0 pypi
scikit-learn 1.5.1 pypi_0 pypi
scipy 1.14.1 pypi_0 pypi
sentencepiece 0.2.0 pypi_0 pypi
setuptools 72.1.0 py312h06a4308_0
six 1.16.0 pypi_0 pypi
sniffio 1.3.1 pypi_0 pypi
soundfile 0.12.1 pypi_0 pypi
soxr 0.5.0 pypi_0 pypi
sqlite 3.45.3 h5eee18b_0
starlette 0.38.2 pypi_0 pypi
sympy 1.13.2 pypi_0 pypi
threadpoolctl 3.5.0 pypi_0 pypi
tiktoken 0.7.0 pypi_0 pypi
tk 8.6.14 h39e8969_0
tokenizers 0.19.1 pypi_0 pypi
torch 2.4.0 pypi_0 pypi
torchvision 0.19.0 pypi_0 pypi
tqdm 4.66.5 pypi_0 pypi
transformers 4.44.2 pypi_0 pypi
triton 3.0.0 pypi_0 pypi
typing-extensions 4.12.2 pypi_0 pypi
tzdata 2024.1 pypi_0 pypi
urllib3 2.2.2 pypi_0 pypi
uvicorn 0.30.6 pypi_0 pypi
uvloop 0.20.0 pypi_0 pypi
vllm 0.5.4 pypi_0 pypi
vllm-flash-attn 2.6.1 pypi_0 pypi
watchfiles 0.24.0 pypi_0 pypi
websockets 13.0.1 pypi_0 pypi
wheel 0.43.0 py312h06a4308_0
xformers 0.0.27.post2 pypi_0 pypi
xxhash 3.5.0 pypi_0 pypi
xz 5.4.6 h5eee18b_1
yarl 1.9.4 pypi_0 pypi
zipp 3.20.1 pypi_0 pypi
zlib 1.2.13 h5eee18b_1
However, the same issue occured:
INFO 09-04 16:09:15 llm_engine.py:174] Initializing an LLM engine (v0.5.4) with config: model='THUDM/LongWriter-glm4-9b', speculative_config=None, tokenizer='THUDM/LongWriter-glm4-9b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=THUDM/LongWriter-glm4-9b, use_v2_block_manager=False, enable_prefix_caching=False)
WARNING 09-04 16:09:16 tokenizer.py:129] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
WARNING 09-04 16:09:16 utils.py:578] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
INFO 09-04 16:09:17 model_runner.py:720] Starting to load model THUDM/LongWriter-glm4-9b...
INFO 09-04 16:09:18 weight_utils.py:225] Using model weights format ['*.safetensors']
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/c/Users/CSOC/Documents/longwriter/lw-vllm.py", line 2, in <module>
[rank0]: model = LLM(
[rank0]: ^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 158, in __init__
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 445, in from_engine_args
[rank0]: engine = cls(
[rank0]: ^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 249, in __init__
[rank0]: self.model_executor = executor_class(
[rank0]: ^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 47, in __init__
[rank0]: self._init_executor()
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/executor/gpu_executor.py", line 36, in _init_executor
[rank0]: self.driver_worker.load_model()
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/worker/worker.py", line 139, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/worker/model_runner.py", line 722, in load_model
[rank0]: self.model = get_model(model_config=self.model_config,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py", line 21, in get_model
[rank0]: return loader.load_model(model_config=model_config,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 328, in load_model
[rank0]: self._get_weights_iterator(model_config.model,
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 288, in _get_weights_iterator
[rank0]: hf_folder, hf_weights_files, use_safetensors = self._prepare_weights(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 271, in _prepare_weights
[rank0]: hf_weights_files = filter_duplicate_safetensors_files(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 290, in filter_duplicate_safetensors_files
[rank0]: weight_map = json.load(index_file)["weight_map"]
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/json/__init__.py", line 293, in load
[rank0]: return loads(fp.read(),
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/json/__init__.py", line 346, in loads
[rank0]: return _default_decoder.decode(s)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/json/decoder.py", line 337, in decode
[rank0]: obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/json/decoder.py", line 355, in raw_decode
[rank0]: raise JSONDecodeError("Expecting value", s, err.value) from None
[rank0]: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
你好,是不是你的模型没有下载成功?可以试试先把模型下载到本地然后用本地路径载入vllm。