Yonghao Zhuang

Results 21 comments of Yonghao Zhuang

Does memory management require some CPU concurrency? e.g. a buffer's last use is a SEND and now it is launched. The next inst is a FREE. Alpa does not actually...

The key point for swapping in XLA is that all parameters should be already in GPU when launching an XlaExecutable. To address this: - When the model is not very...

> ## Cpu Compute Runtime > > - Add a global configure to choose runtime ("cpu" or "gpu") (https://github.com/alpa-projects/alpa/blob/main/alpa/global_env.py) > > - Replace all hard-coded "GPU" to that global configuration...

lgtm. Have you tested the new checkpoint size and reloading from the new checkpoint?

Good to hear that `pytorch_model.bin` can be removed, but the size of `global_stepx` should also be reduced to the size of adapters, but currently it also includes the backbone parameters(in...

Is this pr still ongoing? Seems like the resuming from lora adapter still not having optimizer states and rng states.

Current checkpoint from huggingface/deepspeed does not support this functionality. To store only the LoRA weight after the whole training progress, see [here](https://github.com/lm-sys/FastChat/blob/4960ca702c66b9adaa65945746dba34f8d2c8ddc/fastchat/train/train_lora.py#L66). Maybe one can monkey patch the hf trainer/deepspeed's...

Does your GPU support bfloat16? If not, please try to remove `--bf16 True`.

The `model.dtype` controls the output of each layer, please print out the dtype of [this tensor](https://github.com/lm-sys/FastChat/blob/0e958b852a14f4bef5f0e9d7a5e7373477329cf2/fastchat/train/llama_flash_attn_monkey_patch.py#L30). If it is f32, please further check the `self.q_proj.dtype` which is supposed to be...

so you need to make the dtype of the model to float16/bfloat16. If you are using the train_lora script, I think you need to add these lines in your deepspeed...