yk1012664593
yk1012664593
llama model init start INFO 04-26 17:03:13 llm_engine.py:98] Initializing an LLM engine (v0.4.1) with config: model='/mnt/deep_learning_test/testsuite/dataset/llms_inference_llama7b-v2_accelerate/checkpoint/7B-V2/', speculative_config=None, tokenizer='/mnt/deep_learning_test/testsuite/dataset/llms_inference_llama7b-v2_accelerate/checkpoint/7B-V2/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=Falsequantization=None, enforce_eager=False,...
> 它每次都会发生吗?还是只针对某些提示? > > 另外: > > > 如果您遇到崩溃或挂起,使用 .vllm 中的所有函数调用都将被记录下来。检查这些日志文件,并判断哪个函数崩溃或挂起。`export VLLM_TRACE_FUNCTION=1` Yes, this issue is inevitable. On the H20 model, all vllm versions with float16 accuracy will experience this...