notebooks
notebooks copied to clipboard
Text classification notebook is broken
-
Notebook shown here
-
Loading a model in 4 bit
model_name = "unsloth/Qwen3-4B-Base";load_in_4bit = True
- And removing the lm_head fro target mods as was getting error
# AssertionError: Backwards requires embeddings to be bf16 or fp16
model = FastLanguageModel.get_peft_model(
model,
r = 16,
target_modules = [
# "lm_head", # can easily be trained because it now has a small size
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
use_gradient_checkpointing = "unsloth",
random_state = 3407,
use_rslora = True, # We support rank stabilized LoRA
# init_lora_weights = 'loftq',
# loftq_config = LoftQConfig(loftq_bits = 4, loftq_iter = 1), # And LoftQ
)
print("trainable parameters:", sum(p.numel() for p in model.parameters() if p.requires_grad))
- Training goes fine but # Update the model's lm_head weight and bias throws an error
AttributeError: 'Linear' object has no attribute 'modules_to_save'which I'm guessing is because I removed lm_head layer from training so commented the below line
# Update the model's lm_head weight and bias
with torch.no_grad():
new_lm_head_module = torch.nn.Linear(hidden_dim, old_size, bias=True, device=model.device)
new_lm_head_module.weight.data.copy_(new_lm_head)
new_lm_head_module.bias.data.copy_(new_lm_head_bias)
# model.lm_head.modules_to_save["default"] = new_lm_head_module
- While doing batch inference
On line: pred = torch.argmax(probs).cpu().item()
Error:
RuntimeError Traceback (most recent call last)
[/tmp/ipython-input-18-62861118.py](https://localhost:8080/#) in <cell line: 0>()
32 probs_all = F.softmax(last_logits, dim=-1)
33 probs = probs_all[number_token_ids] # only keep the logits for the number tokens
---> 34 pred = torch.argmax(probs).cpu().item()
35
36 true_label = row['label']
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I'm not sure what's going wrong here as my dataset is of same format (text and label columns) with labels -> [1,2,3]