Richard Sun comments

Repositories
Issues
Comments

Results 4 comments of


                                            Richard Sun

Question about Luong Attention Implementation

I am also confused about why we can calculate all the attention scores for the source sentence using the previous hidden state and current input embedding.

full finetuning of LLaMA 7B: OOM on A100

I also run into this problem with 4 A100s, even with a small batch size.

full finetuning of LLaMA 7B: OOM on A100

I set the cpu_offload option as true (https://lightning.ai/docs/pytorch/2.0.0/_modules/lightning/pytorch/strategies/fsdp.html) for the FSDP strategy and the training process could continue. But I am not sure how long it would take and whether...

Finetune LLAMA-65B using LoRA

Hi rasbt, thanks very much for sharing this project. I can run llama-lora on my local server without much struggle. Is it possible to fine-tune the 65B model on two...