Text classification notebook is broken

Open imvaibhav28 opened this issue 5 months ago • 0 comments

Notebook shown here
Loading a model in 4 bit

model_name = "unsloth/Qwen3-4B-Base";load_in_4bit = True

And removing the lm_head fro target mods as was getting error # AssertionError: Backwards requires embeddings to be bf16 or fp16

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = [
        # "lm_head", # can easily be trained because it now has a small size
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    # init_lora_weights = 'loftq',
    # loftq_config = LoftQConfig(loftq_bits = 4, loftq_iter = 1), # And LoftQ
)
print("trainable parameters:", sum(p.numel() for p in model.parameters() if p.requires_grad))

Training goes fine but # Update the model's lm_head weight and bias throws an error AttributeError: 'Linear' object has no attribute 'modules_to_save' which I'm guessing is because I removed lm_head layer from training so commented the below line

# Update the model's lm_head weight and bias
with torch.no_grad():
    new_lm_head_module = torch.nn.Linear(hidden_dim, old_size, bias=True, device=model.device)
    new_lm_head_module.weight.data.copy_(new_lm_head)
    new_lm_head_module.bias.data.copy_(new_lm_head_bias)
    # model.lm_head.modules_to_save["default"] = new_lm_head_module

While doing batch inference

On line: pred = torch.argmax(probs).cpu().item() Error:

RuntimeError                              Traceback (most recent call last)
[/tmp/ipython-input-18-62861118.py](https://localhost:8080/#) in <cell line: 0>()
     32        probs_all = F.softmax(last_logits, dim=-1)
     33        probs = probs_all[number_token_ids] # only keep the logits for the number tokens
---> 34        pred = torch.argmax(probs).cpu().item()
     35 
     36        true_label = row['label']

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I'm not sure what's going wrong here as my dataset is of same format (text and label columns) with labels -> [1,2,3]

Jul 21 '25 01:07 imvaibhav28