unsloth
unsloth copied to clipboard
GRPO example "'PEFTHelper' object has no attribute 'validate_legal"
I am following along with the colab notebook at:https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(3B)-GRPO.ipynb#scrollTo=vzOuSVCL_GA9 and after training 1 step 🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. 🦥 Unsloth Zoo will now patch everything to make training faster! ==((====))== Unsloth 2025.2.5: Fast Qwen2 patching. Transformers: 4.48.3. \ /| GPU: Tesla V100S-PCIE-32GB. Max memory: 31.733 GB. Platform: Linux. O^O/ _/ \ Torch: 2.5.1+cu121. CUDA: 7.0. CUDA Toolkit: 12.1. Triton: 3.1.0 \ / Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False] "-____-" Free Apache license: http://github.com/unslothai/unsloth Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored! Unsloth: vLLM loading /models/Qwen2.5-1.5B-Instruct with actual GPU utilization = 48.42% Unsloth: Your GPU has CUDA compute capability 7.0 with VRAM = 31.73 GB. Unsloth: Using conservativeness = 1.0. Chunked prefill tokens = 1024. Num Sequences = 256. Unsloth: vLLM's KV Cache can use up to 12.47 GB. Also swap space = 6 GB. WARNING 02-13 08:47:39 config.py:2276] Casting torch.bfloat16 to torch.float16. INFO 02-13 08:47:45 config.py:510] This model supports multiple tasks: {'generate', 'classify', 'reward', 'score', 'embed'}. Defaulting to 'generate'. INFO 02-13 08:47:45 llm_engine.py:234] Initializing an LLM engine (v0.6.6) with config: model='/models/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='/models/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/models/Qwen2.5-1.5B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":0,"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"candidate_compile_sizes":[],"compile_sizes":[],"capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, INFO 02-13 08:47:46 selector.py:217] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. INFO 02-13 08:47:46 selector.py:129] Using XFormers backend. [W213 08:47:46.344152759 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator()) INFO 02-13 08:47:46 model_runner.py:1094] Starting to load model /models/Qwen2.5-1.5B-Instruct... Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 1.90it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 1.90it/s]
INFO 02-13 08:47:47 model_runner.py:1099] Loading model weights took 2.8860 GB
INFO 02-13 08:47:47 punica_selector.py:11] Using PunicaWrapperGPU.
INFO 02-13 08:47:48 worker.py:241] Memory profiling takes 1.07 seconds
INFO 02-13 08:47:48 worker.py:241] the current vLLM instance can use total_gpu_memory (31.73GiB) x gpu_memory_utilization (0.48) = 15.36GiB
INFO 02-13 08:47:48 worker.py:241] model weights take 2.89GiB; non_torch_memory takes 0.12GiB; PyTorch activation peak memory takes 1.40GiB; the rest of the memory reserved for KV Cache is 10.95GiB.
INFO 02-13 08:47:48 gpu_executor.py:76] # GPU blocks: 25631, # CPU blocks: 14043
INFO 02-13 08:47:48 gpu_executor.py:80] Maximum concurrency for 1024 tokens per request: 400.48x
INFO 02-13 08:47:51 model_runner.py:1415] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization
or switching to eager mode. You can also reduce the max_num_seqs
as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:19<00:00, 1.81it/s]
INFO 02-13 08:48:10 model_runner.py:1535] Graph capturing finished in 19 secs, took 0.38 GiB
INFO 02-13 08:48:10 llm_engine.py:431] init engine (profile, create kv cache, warmup model) took 23.31 seconds
Unsloth 2025.2.5 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.
torch.distributed process group is initialized, but parallel_mode != ParallelMode.DISTRIBUTED. In order to use Torch DDP, launch your script with `python -m torch.distributed.launch
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\ /| Num examples = 7,473 | Num Epochs = 1
O^O/ _/ \ Batch size per device = 1 | Gradient Accumulation steps = 1
\ / Total batch size = 1 | Total steps = 250
"-____-" Number of trainable parameters = 9,232,384
0%| | 0/250 [00:00<?, ?it/s]-------------------- Question:
Ahmed and Emily are having a contest to see who can get the best grade in the class. There have been 9 assignments and Ahmed has a 91 in the class. Emily has a 92. The final assignment is worth the same amount as all the other assignments. Emily got a 90 on the final assignment. What is the minimum grade Ahmed needs to get to beat Emily if all grades are whole numbers?
Answer:
100
Response:
To determine the minimum grade Ahmed needs to beat Emily, we first need to calculate the total possible grade Ahmed can get in the class.
Let's assume the maximum grade a student can get is 100. If there have been 9 assignments and each is worth the same, let's denote the total possible grade Ahmed can get as ( A ).
[ A = 100 \times 9 = 900 ]
Ahmed has already scored 91 out of 900. Let ( x ) represent the minimum grade Ahmed needs to beat Emily.
Emily has scored 92 on the first 8 assignments and 90 on the final assignment. The total possible grade for Emily is 900 as well, so we can write her overall grade:
[ 91 + 92 + 92 + 92 + 92 + 92 + 92 + 92 + 90 = Extracted: To determine the minimum grade Ahmed needs to beat Emily, we first need to calculate the total possible grade Ahmed can get in the class.
Let's assume the maximum grade a student can get is 100. If there have been 9 assignments and each is worth the same, let's denote the total possible grade Ahmed can get as ( A ).
[ A = 100 \times 9 = 900 ]
Ahmed has already scored 91 out of 900. Let ( x ) represent the minimum grade Ahmed needs to beat Emily.
Emily has scored 92 on the first 8 assignments and 90 on the final assignment. The total possible grade for Emily is 900 as well, so we can write her overall grade:
[ 91 + 92 + 92 + 92 + 92 + 92 + 92 + 92 + 90 = {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 2.0000000000000002e-07, 'rewards/xmlcount_reward_func': 0.0, 'rewards/soft_format_reward_func': 0.0, 'rewards/strict_format_reward_func': 0.0, 'rewards/int_reward_func': 0.0, 'rewards/correctness_reward_func': 0.0, 'reward': 0.0, 'reward_std': 0.0, 'completion_length': 182.25, 'kl': 0.0, 'epoch': 0.0} 0%|â–Š | 1/250 [00:03<12:58, 3.13s/it]INFO 02-13 08:48:24 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20250213-084824.pkl... INFO 02-13 08:48:24 model_runner_base.py:149] Completed writing input of failed execution to /tmp/err_execute_model_input_20250213-084824.pkl. [rank0]: Traceback (most recent call last): [rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper [rank0]: return func(*args, **kwargs) [rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1632, in execute_model [rank0]: self.set_active_loras(model_input.lora_requests, [rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1344, in set_active_loras [rank0]: self.lora_manager.set_active_adapters(lora_requests, lora_mapping) [rank0]: File "/opt/conda/lib/python3.10/site-packages/unsloth_zoo/vllm_lora_worker_manager.py", line 183, in set_active_adapters [rank0]: set_active_adapters_worker(requests, mapping, self._apply_adapters, [rank0]: File "/opt/conda/lib/python3.10/site-packages/vllm/adapter_commons/utils.py", line 52, in set_active_adapters_worker [rank0]: apply_adapters_func(requests) [rank0]: File "/opt/conda/lib/python3.10/site-packages/unsloth_zoo/vllm_lora_worker_manager.py", line 243, in _apply_adapters [rank0]: self.add_adapter(lora) [rank0]: File "/opt/conda/lib/python3.10/site-packages/unsloth_zoo/vllm_lora_worker_manager.py", line 251, in add_adapter [rank0]: lora = self._load_adapter(lora_request) [rank0]: File "/opt/conda/lib/python3.10/site-packages/unsloth_zoo/vllm_lora_worker_manager.py", line 157, in _load_adapter [rank0]: raise e [rank0]: File "/opt/conda/lib/python3.10/site-packages/unsloth_zoo/vllm_lora_worker_manager.py", line 110, in _load_adapter [rank0]: peft_helper.validate_legal(self.lora_config) [rank0]: AttributeError: 'PEFTHelper' object has no attribute 'validate_legal'
[rank0]: The above exception was the direct cause of the following exception:
[rank0]: Traceback (most recent call last):
[rank0]: File "/workspace/distill_model.py", line 74, in
environment: Ubuntu Tesla V100S-PCIE-32GB triton 3.1.0 trl 0.15.0.dev0 truststore 0.8.0 typeguard 4.4.1 types-dataclasses 0.6.6 typing_extensions 4.12.2 tyro 0.9.14 tzdata 2025.1 unsloth 2025.2.5 unsloth_zoo 2025.2.3 urllib3 1.26.18 uvicorn 0.34.0 uvloop 0.21.0 virtualenv 20.29.2 vllm 0.6.6 peft 0.14.0