Richard Sun

Results 4 comments of Richard Sun

I am also confused about why we can calculate all the attention scores for the source sentence using the previous hidden state and current input embedding.

I also run into this problem with 4 A100s, even with a small batch size.

I set the cpu_offload option as true (https://lightning.ai/docs/pytorch/2.0.0/_modules/lightning/pytorch/strategies/fsdp.html) for the FSDP strategy and the training process could continue. But I am not sure how long it would take and whether...

Hi rasbt, thanks very much for sharing this project. I can run llama-lora on my local server without much struggle. Is it possible to fine-tune the 65B model on two...