Jeremy Cochoy
Jeremy Cochoy
> @kossnick Can you please try converting the model using the change in PR > #524 ? I Had exactly this problem with my "home made" model. Forcing the rank...
May be the source of the problem. But I don't know if the previous implementation (`rank = len(graph.shape_dict[node.inputs[0]])`) would work with the current code base. Unfortunately I don't have the...
@breizhn Unfortunately I got a little overwhelmed by work and didn't progressed on the fft/ifft operators PR. I should definitively resume this. But this is only for the specification part....
Thats indeed what I have done but this seams to be insufficient to run the original LORA configuration. I was able to reproduce the original lora training from the original...
Thanks. I will have a look this evening and keep you updated 👍
I tried the last head. The code do seams to run (i.e. what I got when I copy pasted the missing functions into the file) however I imediately get an...
> If there is only one gpu, maybe you can directly run `train_lora.py` without FSDP(in case it's FS-Data-Parallel). Besides, as mentioned [here](https://github.com/lm-sys/FastChat/blob/main/fastchat/train/train_lora.py#L97-L101), gradient checkpointing with LoRA needs a monkey patch...
I just tried to compile it right now on an ubuntu VM, and didn't get any error. But I noticed that the README.md was unclear. Did you created a `build`...