request for the training code

Open leonardxie opened this issue 1 year ago • 2 comments

hi, thank you for your excellent work. Do you have any plans to share the training code? i want to reproduce the training but raises error as followers

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Jun 11 '24 04:06 leonardxie

me too

Jun 18 '24 12:06 XiaoYiWeio

I also tried to run a training script using Trainer class from Huggingface and I faced several issues and errors including:

Diverging tensor devices (lora weights created on cpu when the model is in gpu)
Different tensor dtype (multiplication between float and bfloat16 when the model is on bfloat16)
Missing peft configs in xLoRAConfig class.

Jun 18 '24 21:06 marcio-afr