Yonghao Zhuang

Results 21 comments of Yonghao Zhuang

Thanks for your willingness to help the community, we sincerely appreciate it! For my experience in OOM, I basically try to open the grdient checkpointing, then I can train with...

Please check [here](https://github.com/lm-sys/FastChat/blob/4f4637832981da418c064546a2ec3a12d6ab9399/fastchat/train/train_lora.py#L64-L84) if you only want to store the LoRA adapter part. Basically, the state dict has every weight, including those that are not trainable in LoRA, so you...

We've tested that the current script is runnable on multiple cards, a single machine(8 x 40GB A100) using DeepSpeed ZeRO-3. The multi-node case is not tested yet. It's WIP

Please check [here](https://github.com/lm-sys/FastChat/pull/138)(mention that the given configuration may not work, and it's just an example of the use case)

For oom, please try to add [these lines](https://github.com/tloen/alpaca-lora/blob/8bb8579e403dc78e37fe81ffbb253c413007323f/finetune.py#L114-L115) from Alpaca-Lora. I'll add a PR if that works.

A takeaway if you want to impl the save weight yourself: https://github.com/lm-sys/FastChat/blob/4d33cde2322544532ab940ed1ece1f82d77fe18c/fastchat/train/train_lora.py#L55-L60 But I think hf should have the same code

Maybe you can try to print out the shape and size of tensors in the `state_dict` to check what happens. Typically for the ungathered zero-3 tensor, there is only a...

I found a bug in the `train_lora.py` example. A hot fix is to use `model.named_parameters()` instead of `state_dict()`. Note that the named_parameters returns an iterable instead of a dict. Please...

We have no plan mainly because 1) the TPU backend of XLA is close sourced; 2) unlike NCCL for GPU, the TPU has no communication library exposed