Yonghao Zhuang comments

Results 21 comments of


Yonghao Zhuang

Encounter the runtime error training with lora and flash_attention together

Thanks for your willingness to help the community, we sincerely appreciate it! For my experience in OOM, I basically try to open the grdient checkpointing, then I can train with...

Encounter the runtime error training with lora and flash_attention together

We have multiple A100-40G

Applying LoRA to vicuna didn't reduce weight file size

Please check [here](https://github.com/lm-sys/FastChat/blob/4f4637832981da418c064546a2ec3a12d6ab9399/fastchat/train/train_lora.py#L64-L84) if you only want to store the LoRA adapter part. Basically, the state dict has every weight, including those that are not trainable in LoRA, so you...

How to use lora to train the 30b model on multiple machines and multiple cards?

We've tested that the current script is runnable on multiple cards, a single machine(8 x 40GB A100) using DeepSpeed ZeRO-3. The multi-node case is not tested yet. It's WIP

How to run train_lora.py

Please check [here](https://github.com/lm-sys/FastChat/pull/138)(mention that the given configuration may not work, and it's just an example of the use case)

How to run train_lora.py

For oom, please try to add [these lines](https://github.com/tloen/alpaca-lora/blob/8bb8579e403dc78e37fe81ffbb253c413007323f/finetune.py#L114-L115) from Alpaca-Lora. I'll add a PR if that works.

safe_save_model_for_hf_trainer function save mode is very small

A takeaway if you want to impl the save weight yourself: https://github.com/lm-sys/FastChat/blob/4d33cde2322544532ab940ed1ece1f82d77fe18c/fastchat/train/train_lora.py#L55-L60 But I think hf should have the same code

safe_save_model_for_hf_trainer function save mode is very small

Maybe you can try to print out the shape and size of tensors in the `state_dict` to check what happens. Typically for the ungathered zero-3 tensor, there is only a...

safe_save_model_for_hf_trainer function save mode is very small

I found a bug in the `train_lora.py` example. A hot fix is to use `model.named_parameters()` instead of `state_dict()`. Note that the named_parameters returns an iterable instead of a dict. Please...

[RUNTIME] TPU Backend Support

We have no plan mainly because 1) the TPU backend of XLA is close sourced; 2) unlike NCCL for GPU, the TPU has no communication library exposed