pawkanarek
pawkanarek
I used the `trainer.save_pretrained` function mentioned in PR https://github.com/huggingface/transformers/pull/29388 but it didn't change anything - trained model after saving is still excactly the same as before training.
I think that i fixed it, but i won't recommend this fix to anyone, so I'm not even thinking about making PR. It's a patch rather than fix, but i...
@shub-kris thanks, > @PawKanarek just to isolate the error, what happens if you run the same code on a GPU instead of TPU? I don't have GPU capable of training...
@moficodes I think you did misunderstand my intentions. I want to save a standalone model, not just the LoRA adapter. You saved only the LoRA adapter (with `trainer.save_model()`), but I...
Thank you @shub-kris ! I will run this script on my local machine and then I will share the results. I have one question regarding to your code, why do...
I think that my original method for comparing weights was broken. When I accessed the parameters with the `params1 = model1.parameters()` Then the method returns iterator function, and it will...
> Is this happening when you're loading a saved model? @amyeroberts No, I copied that warning message from comment of @zorrofox https://github.com/huggingface/transformers/issues/29659#issuecomment-2007343622, but I remember that i also experienced this...
@shub-kris with commented-out FSDP and reduced `batch_size=1` i could finally spot a really fine-tuned model without a warnings. output (click arrow to expand) ``` (v_xla) raix@t1v-n-3a1a9ef8-w-0:~/minefinetune$ cd /home/raix/minefinetune ; /usr/bin/env...
Hi @michaelmoynihan, I also have the `Failed to get global TPU topology` on tpu v4-8, so I followed your advice: > What I would recommend first is trying `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_tpuvm_20240226` and...
I tried to run this script on tpu v3-8 and with slight modifications of the script (I lowered the model to Gemma-2b - because of resource_exhausted bug) could start my...