Yonghao Zhuang comments

Results 21 comments of


Yonghao Zhuang

[FEATURE] Plan to improve the overall cross mesh resharding speed

Does memory management require some CPU concurrency? e.g. a buffer's last use is a SEND and now it is launched. The next inst is a FREE. Alpa does not actually...

[FEATURE] A CPU Swapping Runtime

The key point for swapping in XLA is that all parameters should be already in GPU when launching an XlaExecutable. To address this: - When the model is not very...

[FEATURE] A CPU Swapping Runtime

> ## Cpu Compute Runtime > > - Add a global configure to choose runtime ("cpu" or "gpu") (https://github.com/alpa-projects/alpa/blob/main/alpa/global_env.py) > > - Replace all hard-coded "GPU" to that global configuration...

Add callback on save for LoRA

lgtm. Have you tested the new checkpoint size and reloading from the new checkpoint?

Add callback on save for LoRA

Good to hear that `pytorch_model.bin` can be removed, but the size of `global_stepx` should also be reduced to the size of adapters, but currently it also includes the backbone parameters(in...

Add callback on save for LoRA

Is this pr still ongoing? Seems like the resuming from lora adapter still not having optimizer states and rng states.

How to save lora weights only???

Current checkpoint from huggingface/deepspeed does not support this functionality. To store only the LoRA weight after the whole training progress, see [here](https://github.com/lm-sys/FastChat/blob/4960ca702c66b9adaa65945746dba34f8d2c8ddc/fastchat/train/train_lora.py#L66). Maybe one can monkey patch the hf trainer/deepspeed's...

Encounter the runtime error training with lora and flash_attention together

Does your GPU support bfloat16? If not, please try to remove `--bf16 True`.

Encounter the runtime error training with lora and flash_attention together

The `model.dtype` controls the output of each layer, please print out the dtype of [this tensor](https://github.com/lm-sys/FastChat/blob/0e958b852a14f4bef5f0e9d7a5e7373477329cf2/fastchat/train/llama_flash_attn_monkey_patch.py#L30). If it is f32, please further check the `self.q_proj.dtype` which is supposed to be...

Encounter the runtime error training with lora and flash_attention together

so you need to make the dtype of the model to float16/bfloat16. If you are using the train_lora script, I think you need to add these lines in your deepspeed...