[Bug]: Issue when benchmarking the dynamically served LoRA adapter
My current environment
[pip3] numpy==2.1.1
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.6.68
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pyzmq==25.1.2
[pip3] torch==2.4.1
[pip3] transformers==4.44.2
[pip3] triton==3.0.0
[conda] numpy 2.1.1 pypi_0 pypi
[conda] nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.20.5 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.6.68 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
[conda] pyzmq 25.1.2 py311h6a678d5_0
[conda] torch 2.4.1 pypi_0 pypi
[conda] transformers 4.44.2 pypi_0 pypi
[conda] triton 3.0.0 pypi_0 pypi```
Model Input Dumps
No response
🐛 Describe the bug
I'M working with serving LoRA adapter dynamically with:
!export VLLM_ALLOW_RUNTIME_LORA_UPDATING=True
!curl -X POST http://address_to_model/v1/load_lora_adapter \
-H "Content-Type: application/json" \
-d '{"lora_name": "meta-llama/Meta-Llama-3.1-8B-Instruct", "lora_path": "path/to/epoch_9"}'
The model with name meta-llama/Meta-Llama-3.1-8B-Instruct is now running in a kubenetes Pod with a single GPU A100, after that I used lm evaluation harness framework https://github.com/EleutherAI/lm-evaluation-harness?tab=readme-ov-file#model-apis-and-inference-servers for benchmarking the model:
!lm_eval --model local-completions \
--tasks mmlu \
--apply_chat_template \
--model_args model=meta-llama/Meta-Llama-3.1-8B-Instruct,base_url=http://address_to_model/v1/completions,num_concurrent=10,max_retries=3,tokenizer_backend=huggingface \
--use_cache \
--output_path path/to/output
OUTPUT
n a car\'s radiator, cooling the body to prevent rapid increases in core body temperature and promoting heat tolerance… Repeated sauna use acclimates the body to heat and optimizes the body\'s response to future exposures, likely due to a biological phenomenon known as hormesis, a compensatory defense response following exposure to a mild stressor that is disproportionate to the magnitude of the stressor. Hormesis triggers a vast array of protective mechanisms that not only repair cell damage but also provide protection from subsequent exposures to more devastating stressors… The physiological responses to sauna use are remarkably similar to those experienced during moderate- to vigorous-intensity exercise. In fact, sauna use has been proposed as an alternative to exercise for people who are unable to engage in physical activity due to chronic disease or physical limitations.[13]\n\nBased on the article, what would be an important thing for a person to do after sauna use?\nA. Shower in cold water.\nB. Exercise.\nC. Eat a meal.\nD. Replenish fluids with filtered water.\nAnswer:<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n D', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=1234, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1, min_tokens=0, logprobs=1, prompt_logprobs=1, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 1627, 10263, 220, 2366, 19, 271, 791, 2768, 527, 5361, 5873, 4860, 320, 4291, 11503, 8, 922, 7926, 16088, 13, 128009, 128006, 882, 128007, 271, 53379, 8733, 1005, 11, 7170, 14183, 311, 439, 330, 9258, 8733, 73509, 1359, 374, 32971, 555, 2875, 9860, 28979, 14675, 311, 14560, 8798, 13, 1115, 14675, 658, 51650, 23900, 17508, 700, 91299, 1389, 459, 5376, 304, 279, 2547, 596, 6332, 9499, 1389, 430, 90974, 264, 30945, 461, 70, 38220, 2077, 16239, 18247, 408, 78738, 11, 41713, 11, 323, 9693, 3565, 4744, 96978, 24717, 430, 990, 3871, 311, 15301, 2162, 537, 10949, 323, 3044, 279, 2547, 369, 3938, 8798, 8631, 1105, 1981, 763, 3293, 11026, 11, 47958, 73509, 706, 22763, 439, 264, 3445, 311, 5376, 61961, 323, 7417, 8244, 2890, 11, 3196, 389, 29722, 828, 505, 90380, 11, 958, 44322, 11, 323, 7852, 4633, 7978, 13, 5046, 4040, 2802, 527, 279, 14955, 505, 7978, 315, 13324, 304, 279, 33479, 454, 822, 2209, 2464, 292, 18449, 31974, 32388, 38829, 320, 82071, 19694, 8, 19723, 11, 459, 14529, 33547, 7187, 6108, 41944, 4007, 315, 2890, 20124, 304, 810, 1109, 220, 17, 11, 3101, 6278, 57859, 3026, 505, 24024, 37355, 11, 902, 11054, 3831, 7902, 1990, 47958, 1005, 323, 11293, 4648, 323, 8624, 1981, 578, 735, 40, 19694, 14955, 8710, 430, 3026, 889, 1511, 279, 47958, 1403, 311, 2380, 3115, 824, 2046, 1051, 220, 1544, 3346, 2753, 4461, 311, 2815, 505, 41713, 14228, 11384, 1109, 3026, 889, 3287, 956, 1005, 279, 47958, 8032, 17, 60, 24296, 11, 279, 7720, 814, 10534, 1051, 1766, 311, 387, 19660, 43918, 25, 11258, 889, 1511, 279, 47958, 17715, 11157, 439, 3629, 11, 922, 3116, 311, 8254, 3115, 824, 2046, 11, 10534, 17715, 11157, 279, 7720, 1389, 323, 1051, 220, 1135, 3346, 2753, 4461, 311, 2815, 505, 41713, 14228, 11384, 8032, 17, 60, 763, 5369, 11, 21420, 47958, 3932, 1051, 1766, 311, 387, 220, 1272, 3346, 2753, 4461, 311, 2815, 505, 682, 11384, 315, 42227, 4648, 13, 4314, 14955, 5762, 837, 1524, 994, 13126, 4325, 11, 5820, 5990, 11, 323, 19433, 9547, 430, 2643, 617, 28160, 279, 3026, 596, 2890, 8032, 17, 60, 1131, 578, 735, 40, 19694, 1101, 10675, 430, 21420, 47958, 1005, 11293, 279, 5326, 315, 11469, 52857, 323, 44531, 596, 8624, 304, 264, 19660, 43918, 11827, 13, 11258, 889, 1511, 279, 47958, 1403, 311, 2380, 3115, 824, 2046, 1047, 264, 220, 2287, 3346, 4827, 5326, 315, 11469, 52857, 323, 264, 220, 2397, 3346, 4827, 5326, 315, 11469, 44531, 596, 8624, 11, 7863, 311, 3026, 889, 1511, 279, 47958, 1193, 832, 892, 824, 2046, 1981, 578, 2890, 7720, 5938, 449, 47958, 1005, 11838, 311, 1023, 13878, 315, 10723, 2890, 11, 439, 1664, 13, 11258, 24435, 304, 279, 735, 40, 19694, 4007, 889, 1511, 279, 47958, 3116, 311, 8254, 3115, 824, 2046, 1051, 220, 2813, 3346, 2753, 4461, 311, 2274, 94241, 24673, 11, 15851, 315, 279, 3026, 596, 34625, 26870, 11, 80431, 2704, 11, 7106, 5820, 11, 323, 47288, 2704, 320, 300, 17303, 555, 356, 31696, 535, 13128, 8, 1981, 849, 12313, 311, 1579, 9499, 59623, 279, 2547, 11, 95360, 5977, 264, 11295, 11, 22514, 2077, 13, 578, 6930, 323, 6332, 2547, 20472, 5376, 88101, 11, 323, 81366, 4675, 1157, 13, 578, 6930, 77662, 1176, 11, 16448, 311, 220, 1272, 32037, 320, 6849, 59572, 705, 323, 1243, 4442, 304, 6332, 2547, 9499, 12446, 11, 16448, 14297, 505, 220, 1806, 32037, 320, 3264, 13, 21, 59572, 11, 477, 4725, 8, 311, 220, 1987, 32037, 320, 1041, 13, 19, 59572, 8, 323, 1243, 19019, 7859, 311, 220, 2137, 32037, 320, 4278, 13, 17, 59572, 8, 1981, 220, 6938, 18029, 2612, 11, 264, 6767, 315, 279, 3392, 315, 990, 279, 4851, 27772, 304, 2077, 311, 279, 2547, 596, 1205, 369, 24463, 11, 12992, 555, 220, 1399, 311, 220, 2031, 3346, 11, 1418, 279, 4851, 4478, 320, 1820, 1396, 315, 34427, 824, 9568, 8, 12992, 323, 279, 12943, 8286, 320, 1820, 3392, 315, 6680, 62454, 8, 8625, 35957, 8032, 20, 60, 12220, 420, 892, 11, 13489, 220, 1135, 311, 220, 2031, 3346, 315, 279, 2547, 596, 6680, 6530, 374, 74494, 505, 279, 6332, 311, 279, 6930, 311, 28696, 81366, 13, 578, 5578, 1732, 33291, 13489, 220, 15, 13, 20, 21647, 315, 28566, 1418, 47958, 73509, 8032, 806, 60, 6515, 1088, 8798, 14675, 1101, 90974, 264, 41658, 5376, 304, 8244, 32426, 8286, 311, 50460, 279, 18979, 304, 6332, 6680, 8286, 13, 1115, 5376, 304, 32426, 8286, 539, 1193, 5825, 264, 21137, 2592, 315, 15962, 369, 81366, 11, 719, 433, 1101, 14385, 1093, 279, 3090, 304, 264, 1841, 596, 78190, 11, 28015, 279, 2547, 311, 5471, 11295, 12992, 304, 6332, 2547, 9499, 323, 22923, 8798, 25065, 1981, 1050, 43054, 47958, 1005, 1645, 566, 48571, 279, 2547, 311, 8798, 323, 7706, 4861, 279, 2547, 596, 2077, 311, 3938, 70530, 11, 4461, 4245, 311, 264, 24156, 25885, 3967, 439, 21548, 14093, 11, 264, 14573, 5382, 9232, 2077, 2768, 14675, 311, 264, 23900, 8631, 269, 430, 374, 80153, 311, 279, 26703, 315, 279, 8631, 269, 13, 92208, 14093, 31854, 264, 13057, 1358, 315, 29219, 24717, 430, 539, 1193, 13023, 2849, 5674, 719, 1101, 3493, 9313, 505, 17876, 70530, 311, 810, 33318, 8631, 1105, 1981, 578, 53194, 14847, 311, 47958, 1005, 527, 49723, 4528, 311, 1884, 10534, 2391, 24070, 12, 311, 71920, 20653, 8127, 10368, 13, 763, 2144, 11, 47958, 1005, 706, 1027, 11223, 439, 459, 10778, 311, 10368, 369, 1274, 889, 527, 12153, 311, 16988, 304, 7106, 5820, 4245, 311, 21249, 8624, 477, 7106, 9669, 8032, 1032, 2595, 29815, 389, 279, 4652, 11, 1148, 1053, 387, 459, 3062, 3245, 369, 264, 1732, 311, 656, 1306, 47958, 1005, 5380, 32, 13, 48471, 304, 9439, 3090, 627, 33, 13, 33918, 627, 34, 13, 45614, 264, 15496, 627, 35, 13, 1050, 87635, 819, 56406, 449, 18797, 3090, 627, 16533, 25, 128009, 128006, 78191, 128007, 271, 423], lora_request: LoRARequest(lora_name='meta-llama/Meta-Llama-3.1-8B-Instruct', lora_int_id=1, lora_path='here_is_path_to_lora', lora_local_path=None, long_lora_max_len=None), prompt_adapter_request: None.
INFO 09-18 02:18:58 logger.py:36] Received request cmpl-215927f7b38d4107bf9fef896613dadb-0: prompt: '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 Jul 2024\n\nThe following are multiple choice questions (with answers) about college medicine.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nSauna use, sometimes referred to as "sauna bathing," is characterized by short-term passive exposure to extreme heat. This exposure elicits mild hyperthermia – an increase in the body\'s core temperature – that induces a thermoregulatory response involving neuroendocrine, cardiovascular, and cytoprotective mechanisms that work together to restore homeostasis and condition the body for future heat stressors… In recent decades, sauna bathing has emerged as a means to increase lifespan and improve overall health, based on compelling data from observational, interventional, and mechanistic studies. Of particular interest are the findings from studies of participants in the Kuopio Ischemic Heart Disease Risk Factor (KIHD) Study, an ongoing prospective population-based cohort study of health outcomes in more than 2,300 middle-aged men from eastern Finland, which identified strong links between sauna use and reduced death and disease… The KIHD findings showed that men who used the sauna two to three times per week were 27 percent less likely to die from cardiovascular-related causes than men who didn\'t use the sauna.[2] Furthermore, the benefits they experienced were found to be dose-dependent: Men who used the sauna roughly twice as often, about four to seven times per week, experienced roughly twice the benefits – and were 50 percent less likely to die from cardiovascular-related causes.[2] In addition, frequent sauna users were found to be 40 percent less likely to die from all causes of premature death. These findings held true even when considering age, activity levels, and lifestyle factors that might have influenced the men\'s health.[2]... The KIHD also revealed that frequent sauna use reduced the risk of developing dementia and Alzheimer\'s disease in a dose-dependent manner. Men who used the sauna two to three times per week had a 66 percent lower risk of developing dementia and a 65 percent lower risk of developing Alzheimer\'s disease, compared to men who used the sauna only one time per week… The health benefits associated with sauna use extended to other aspects of mental health, as well. Men participating in the KIHD study who used the sauna four to seven times per week were 77 percent less likely to develop psychotic disorders, regardless of the men\'s dietary habits, socioeconomic status, physical activity, and inflammatory status (as measured by C-reactive protein)…Exposure to high temperature stresses the body, eliciting a rapid, robust response. The skin and core body temperatures increase markedly, and sweating ensues. The skin heats first, rising to 40°C (104°F), and then changes in core body temperature occur, rising slowly from 37°C (98.6°F, or normal) to 38°C (100.4°F) and then rapidly increasing to 39°C (102.2°F)… Cardiac output, a measure of the amount of work the heart performs in response to the body\'s need for oxygen, increases by 60 to 70 percent, while the heart rate (the number of beats per minute) increases and the stroke volume (the amount of blood pumped) remains unchanged.[5] During this time, approximately 50 to 70 percent of the body\'s blood flow is redistributed from the core to the skin to facilitate sweating. The average person loses approximately 0.5 kg of sweat while sauna bathing.[11] Acute heat exposure also induces a transient increase in overall plasma volume to mitigate the decrease in core blood volume. This increase in plasma volume not only provides a reserve source of fluid for sweating, but it also acts like the water in a car\'s radiator, cooling the body to prevent rapid increases in core body temperature and promoting heat tolerance… Repeated sauna use acclimates the body to heat and optimizes the body\'s response to future exposures, likely due to a biological phenomenon known as hormesis, a compensatory defense response following exposure to a mild stressor that is disproportionate to the magnitude of the stressor. Hormesis triggers a vast array of protective mechanisms that not only repair cell damage but also provide protection from subsequent exposures to more devastating stressors… The physiological responses to sauna use are remarkably similar to those experienced during moderate- to vigorous-intensity exercise. In fact, sauna use has been proposed as an alternative to exercise for people who are unable to engage in physical activity due to chronic disease or physical limitations.[13]\n\nBased on the article, what would be an important thing for a person to do after sauna use?\nA. Shower in cold water.\nB. Exercise.\nC. Eat a meal.\nD. Replenish fluids with filtered water.\nAnswer:<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n B', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=1234, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1, min_tokens=0, logprobs=1, prompt_logprobs=1, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 1627, 10263, 220, 2366, 19, 271, 791, 2768, 527, 5361, 5873, 4860, 320, 4291, 11503, 8, 922, 7926, 16088, 13, 128009, 128006, 882, 128007, 271, 53379, 8733, 1005, 11, 7170, 14183, 311, 439, 330, 9258, 8733, 73509, 1359, 374, 32971, 555, 2875, 9860, 28979, 14675, 311, 14560, 8798, 13, 1115, 14675, 658, 51650, 23900, 17508, 700, 91299, 1389, 459, 5376, 304, 279, 2547, 596, 6332, 9499, 1389, 430, 90974, 264, 30945, 461, 70, 38220, 2077, 16239, 18247, 408, 78738, 11, 41713, 11, 323, 9693, 3565, 4744, 96978, 24717, 430, 990, 3871, 311, 15301, 2162, 537, 10949, 323, 3044, 279, 2547, 369, 3938, 8798, 8631, 1105, 1981, 763, 3293, 11026, 11, 47958, 73509, 706, 22763, 439, 264, 3445, 311, 5376, 61961, 323, 7417, 8244, 2890, 11, 3196, 389, 29722, 828, 505, 90380, 11, 958, 44322, 11, 323, 7852, 4633, 7978, 13, 5046, 4040, 2802, 527, 279, 14955, 505, 7978, 315, 13324, 304, 279, 33479, 454, 822, 2209, 2464, 292, 18449, 31974, 32388, 38829, 320, 82071, 19694, 8, 19723, 11, 459, 14529, 33547, 7187, 6108, 41944, 4007, 315, 2890, 20124, 304, 810, 1109, 220, 17, 11, 3101, 6278, 57859, 3026, 505, 24024, 37355, 11, 902, 11054, 3831, 7902, 1990, 47958, 1005, 323, 11293, 4648, 323, 8624, 1981, 578, 735, 40, 19694, 14955, 8710, 430, 3026, 889, 1511, 279, 47958, 1403, 311, 2380, 3115, 824, 2046, 1051, 220, 1544, 3346, 2753, 4461, 311, 2815, 505, 41713, 14228, 11384, 1109, 3026, 889, 3287, 956, 1005, 279, 47958, 8032, 17, 60, 24296, 11, 279, 7720, 814, 10534, 1051, 1766, 311, 387, 19660, 43918, 25, 11258, 889, 1511, 279, 47958, 17715, 11157, 439, 3629, 11, 922, 3116, 311, 8254, 3115, 824, 2046, 11, 10534, 17715, 11157, 279, 7720, 1389, 323, 1051, 220, 1135, 3346, 2753, 4461, 311, 2815, 505, 41713, 14228, 11384, 8032, 17, 60, 763, 5369, 11, 21420, 47958, 3932, 1051, 1766, 311, 387, 220, 1272, 3346, 2753, 4461, 311, 2815, 505, 682, 11384, 315, 42227, 4648, 13, 4314, 14955, 5762, 837, 1524, 994, 13126, 4325, 11, 5820, 5990, 11, 323, 19433, 9547, 430, 2643, 617, 28160, 279, 3026, 596, 2890, 8032, 17, 60, 1131, 578, 735, 40, 19694, 1101, 10675, 430, 21420, 47958, 1005, 11293, 279, 5326, 315, 11469, 52857, 323, 44531, 596, 8624, 304, 264, 19660, 43918, 11827, 13, 11258, 889, 1511, 279, 47958, 1403, 311, 2380, 3115, 824, 2046, 1047, 264, 220, 2287, 3346, 4827, 5326, 315, 11469, 52857, 323, 264, 220, 2397, 3346, 4827, 5326, 315, 11469, 44531, 596, 8624, 11, 7863, 311, 3026, 889, 1511, 279, 47958, 1193, 832, 892, 824, 2046, 1981, 578, 2890, 7720, 5938, 449, 47958, 1005, 11838, 311, 1023, 13878, 315, 10723, 2890, 11, 439, 1664, 13, 11258, 24435, 304, 279, 735, 40, 19694, 4007, 889, 1511, 279, 47958, 3116, 311, 8254, 3115, 824, 2046, 1051, 220, 2813, 3346, 2753, 4461, 311, 2274, 94241, 24673, 11, 15851, 315, 279, 3026, 596, 34625, 26870, 11, 80431, 2704, 11, 7106, 5820, 11, 323, 47288, 2704, 320, 300, 17303, 555, 356, 31696, 535, 13128, 8, 1981, 849, 12313, 311, 1579, 9499, 59623, 279, 2547, 11, 95360, 5977, 264, 11295, 11, 22514, 2077, 13, 578, 6930, 323, 6332, 2547, 20472, 5376, 88101, 11, 323, 81366, 4675, 1157, 13, 578, 6930, 77662, 1176, 11, 16448, 311, 220, 1272, 32037, 320, 6849, 59572, 705, 323, 1243, 4442, 304, 6332, 2547, 9499, 12446, 11, 16448, 14297, 505, 220, 1806, 32037, 320, 3264, 13, 21, 59572, 11, 477, 4725, 8, 311, 220, 1987, 32037, 320, 1041, 13, 19, 59572, 8, 323, 1243, 19019, 7859, 311, 220, 2137, 32037, 320, 4278, 13, 17, 59572, 8, 1981, 220, 6938, 18029, 2612, 11, 264, 6767, 315, 279, 3392, 315, 990, 279, 4851, 27772, 304, 2077, 311, 279, 2547, 596, 1205, 369, 24463, 11, 12992, 555, 220, 1399, 311, 220, 2031, 3346, 11, 1418, 279, 4851, 4478, 320, 1820, 1396, 315, 34427, 824, 9568, 8, 12992, 323, 279, 12943, 8286, 320, 1820, 3392, 315, 6680, 62454, 8, 8625, 35957, 8032, 20, 60, 12220, 420, 892, 11, 13489, 220, 1135, 311, 220, 2031, 3346, 315, 279, 2547, 596, 6680, 6530, 374, 74494, 505, 279, 6332, 311, 279, 6930, 311, 28696, 81366, 13, 578, 5578, 1732, 33291, 13489, 220, 15, 13, 20, 21647, 315, 28566, 1418, 47958, 73509, 8032, 806, 60, 6515, 1088, 8798, 14675, 1101, 90974, 264, 41658, 5376, 304, 8244, 32426, 8286, 311, 50460, 279, 18979, 304, 6332, 6680, 8286, 13, 1115, 5376, 304, 32426, 8286, 539, 1193, 5825, 264, 21137, 2592, 315, 15962, 369, 81366, 11, 719, 433, 1101, 14385, 1093, 279, 3090, 304, 264, 1841, 596, 78190, 11, 28015, 279, 2547, 311, 5471, 11295, 12992, 304, 6332, 2547, 9499, 323, 22923, 8798, 25065, 1981, 1050, 43054, 47958, 1005, 1645, 566, 48571, 279, 2547, 311, 8798, 323, 7706, 4861, 279, 2547, 596, 2077, 311, 3938, 70530, 11, 4461, 4245, 311, 264, 24156, 25885, 3967, 439, 21548, 14093, 11, 264, 14573, 5382, 9232, 2077, 2768, 14675, 311, 264, 23900, 8631, 269, 430, 374, 80153, 311, 279, 26703, 315, 279, 8631, 269, 13, 92208, 14093, 31854, 264, 13057, 1358, 315, 29219, 24717, 430, 539, 1193, 13023, 2849, 5674, 719, 1101, 3493, 9313, 505, 17876, 70530, 311, 810, 33318, 8631, 1105, 1981, 578, 53194, 14847, 311, 47958, 1005, 527, 49723, 4528, 311, 1884, 10534, 2391, 24070, 12, 311, 71920, 20653, 8127, 10368, 13, 763, 2144, 11, 47958, 1005, 706, 1027, 11223, 439, 459, 10778, 311, 10368, 369, 1274, 889, 527, 12153, 311, 16988, 304, 7106, 5820, 4245, 311, 21249, 8624, 477, 7106, 9669, 8032, 1032, 2595, 29815, 389, 279, 4652, 11, 1148, 1053, 387, 459, 3062, 3245, 369, 264, 1732, 311, 656, 1306, 47958, 1005, 5380, 32, 13, 48471, 304, 9439, 3090, 627, 33, 13, 33918, 627, 34, 13, 45614, 264, 15496, 627, 35, 13, 1050, 87635, 819, 56406, 449, 18797, 3090, 627, 16533, 25, 128009, 128006, 78191, 128007, 271, 426], lora_request: LoRARequest(lora_name='meta-llama/Meta-Llama-3.1-8B-Instruct', lora_int_id=1, lora_path='/here_is_path_to_lora', lora_local_path=None, long_lora_max_len=None), prompt_adapter_request: None.
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:4772 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:15180 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:39275 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:55198 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:26696 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:1187 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:19113 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:44661 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:35802 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:10019 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
INFO 09-18 02:18:58 logger.py:36] Received request cmpl-f15b280099c04bf7b666b102190147d8-0: prompt: '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 Jul 2024\n\nThe following are multiple choice questions (with answers) about high school world history.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nThis question refers to the following information.\nSource 1:\n"You may well ask: "Why direct action? Why sit-ins, marches and so forth? Isn\'t negotiation a better path?" You are quite right in calling, for negotiation. Indeed, this is the very purpose of direct action. Nonviolent direct action seeks to create such a crisis and
...
INFO: Shutting down
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:4772 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:15180 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:44661 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:35802 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:39275 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:26696 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:55198 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:19113 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:1187 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
CRITICAL 09-18 02:18:58 launcher.py:98] AsyncLLMEngine is already dead, terminating server process
INFO: 10.0.116.118:10019 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [8]
INFO 09-18 02:18:58 server.py:228] vLLM ZMQ RPC Server was interrupted.
Future exception was never retrieved
future: <Future finished exception=RuntimeError('LLMEngine should not be pickled!')>
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner_base.py", line 112, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1589, in execute_model
output: SamplerOutput = self.model.sample(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 466, in sample
next_tokens = self.sampler(logits, sampling_metadata)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/sampler.py", line 273, in forward
probs = torch.softmax(logits, dim=-1, dtype=torch.float)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.85 GiB. GPU 0 has a total capacity of 79.33 GiB of which 3.02 GiB is free. Process 308135 has 76.29 GiB memory in use. Of the allocated memory 75.34 GiB is allocated by PyTorch, with 31.38 MiB allocated in private pools (e.g., CUDA Graphs), and 79.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 115, in generate
async for request_output in results_generator:
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 859, in generate
async for output in await self.add_request(
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 106, in generator
raise result
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 115, in generate
async for request_output in results_generator:
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 859, in generate
async for output in await self.add_request(
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 106, in generator
raise result
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 115, in generate
async for request_output in results_generator:
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 859, in generate
async for output in await self.add_request(
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 106, in generator
raise result
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 115, in generate
async for request_output in results_generator:
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 859, in generate
async for output in await self.add_request(
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 106, in generator
raise result
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 115, in generate
async for request_output in results_generator:
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 859, in generate
async for output in await self.add_request(
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 106, in generator
raise result
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 115, in generate
async for request_output in results_generator:
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 859, in generate
async for output in await self.add_request(
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 106, in generator
raise result
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 115, in generate
async for request_output in results_generator:
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 859, in generate
async for output in await self.add_request(
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 106, in generator
raise result
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 115, in generate
async for request_output in results_generator:
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 859, in generate
async for output in await self.add_request(
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 106, in generator
raise result
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 115, in generate
async for request_output in results_generator:
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 859, in generate
async for output in await self.add_request(
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 106, in generator
raise result
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 115, in generate
async for request_output in results_generator:
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 859, in generate
async for output in await self.add_request(
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 106, in generator
raise result
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 48, in _log_task_completion
return_value = task.result()
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 733, in run_engine_loop
result = task.result()
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 673, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 340, in step_async
outputs = await self.model_executor.execute_model_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/gpu_executor.py", line 185, in execute_model_async
output = await make_async(self.driver_worker.execute_model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 327, in execute_model
output = self.model_runner.execute_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner_base.py", line 125, in _wrapper
pickle.dump(dumped_inputs, filep)
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 563, in __reduce__
raise RuntimeError("LLMEngine should not be pickled!")
RuntimeError: LLMEngine should not be pickled!
I would like to know if this is a bug from vllm where the request not in a queue and causing overloading vllm server, or the error is coming from somewhere else, could anyone help me for this case? Thank you very much in advance!
Before submitting a new issue...
- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
I have the same question as you when I use the lm-eval to evaluate the LLMs. Do you have solved this question? my command and info are as follows:
lm-eval --model vllm --model_args pretrained=/home/T3090U1/CZ/model/Qwen1.5-7B-Chat/,dtype=auto,tensor_parallel_size=2,dtype=auto,gpu_memory_utilization=0.9,max_model_len=4096 --tasks=leaderboard --batch_size=auto --output_path=/home/T3090U1/CZ/work3/output
error:
[rank0]: Traceback (most recent call last): [rank0]: File "/home/T3090U1/anaconda3/envs/work3/bin/lm-eval", line 8, in <module> [rank0]: sys.exit(cli_evaluate()) [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/__main__.py", line 369, in cli_evaluate [rank0]: results = evaluator.simple_evaluate( [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/utils.py", line 395, in _wrapper [rank0]: return fn(*args, **kwargs) [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/evaluator.py", line 277, in simple_evaluate [rank0]: results = evaluate( [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/utils.py", line 395, in _wrapper [rank0]: return fn(*args, **kwargs) [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/evaluator.py", line 444, in evaluate [rank0]: resps = getattr(lm, reqtype)(cloned_reqs) [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/api/model.py", line 370, in loglikelihood [rank0]: return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm) [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/models/vllm_causallms.py", line 415, in _loglikelihood_tokens [rank0]: outputs = self._model_generate(requests=inputs, generate=False) [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/models/vllm_causallms.py", line 248, in _model_generate [rank0]: outputs = self.model.generate( [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/utils.py", line 1036, in inner [rank0]: return fn(*args, **kwargs) [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 348, in generate [rank0]: outputs = self._run_engine(use_tqdm=use_tqdm) [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 715, in _run_engine [rank0]: step_outputs = self.llm_engine.step() [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 1223, in step [rank0]: outputs = self.model_executor.execute_model( [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/executor/distributed_gpu_executor.py", line 78, in execute_model [rank0]: driver_outputs = self._driver_execute_model(execute_model_req) [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/executor/multiproc_gpu_executor.py", line 162, in _driver_execute_model [rank0]: return self.driver_worker.execute_model(execute_model_req) [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/worker/worker_base.py", line 327, in execute_model [rank0]: output = self.model_runner.execute_model( [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/worker/model_runner_base.py", line 125, in _wrapper [rank0]: pickle.dump(dumped_inputs, filep) [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 563, in __reduce__ [rank0]: raise RuntimeError("LLMEngine should not be pickled!") [rank0]: RuntimeError: LLMEngine should not be pickled!
I have the same question as you when I use the lm-eval to evaluate the LLMs. Do you have solved this question? my command and info are as follows:
lm-eval --model vllm --model_args pretrained=/home/T3090U1/CZ/model/Qwen1.5-7B-Chat/,dtype=auto,tensor_parallel_size=2,dtype=auto,gpu_memory_utilization=0.9,max_model_len=4096 --tasks=leaderboard --batch_size=auto --output_path=/home/T3090U1/CZ/work3/outputerror:
[rank0]: Traceback (most recent call last): [rank0]: File "/home/T3090U1/anaconda3/envs/work3/bin/lm-eval", line 8, in <module> [rank0]: sys.exit(cli_evaluate()) [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/__main__.py", line 369, in cli_evaluate [rank0]: results = evaluator.simple_evaluate( [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/utils.py", line 395, in _wrapper [rank0]: return fn(*args, **kwargs) [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/evaluator.py", line 277, in simple_evaluate [rank0]: results = evaluate( [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/utils.py", line 395, in _wrapper [rank0]: return fn(*args, **kwargs) [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/evaluator.py", line 444, in evaluate [rank0]: resps = getattr(lm, reqtype)(cloned_reqs) [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/api/model.py", line 370, in loglikelihood [rank0]: return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm) [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/models/vllm_causallms.py", line 415, in _loglikelihood_tokens [rank0]: outputs = self._model_generate(requests=inputs, generate=False) [rank0]: File "/home/T3090U1/CZ/work3/lm_eval/models/vllm_causallms.py", line 248, in _model_generate [rank0]: outputs = self.model.generate( [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/utils.py", line 1036, in inner [rank0]: return fn(*args, **kwargs) [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 348, in generate [rank0]: outputs = self._run_engine(use_tqdm=use_tqdm) [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 715, in _run_engine [rank0]: step_outputs = self.llm_engine.step() [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 1223, in step [rank0]: outputs = self.model_executor.execute_model( [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/executor/distributed_gpu_executor.py", line 78, in execute_model [rank0]: driver_outputs = self._driver_execute_model(execute_model_req) [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/executor/multiproc_gpu_executor.py", line 162, in _driver_execute_model [rank0]: return self.driver_worker.execute_model(execute_model_req) [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/worker/worker_base.py", line 327, in execute_model [rank0]: output = self.model_runner.execute_model( [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/worker/model_runner_base.py", line 125, in _wrapper [rank0]: pickle.dump(dumped_inputs, filep) [rank0]: File "/home/T3090U1/anaconda3/envs/work3/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 563, in __reduce__ [rank0]: raise RuntimeError("LLMEngine should not be pickled!") [rank0]: RuntimeError: LLMEngine should not be pickled!
Sorry but there is still no answer for this question
尝试一下在启动vllm-openai时使用:
python3 -m vllm.entrypoints.openai.api_server \
--enable-lora \ # <-------
--api-key a0bxxxxxxxx8a
我在qwen2.5-7b上测试没有问题
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!