Hao Zhang

Results 19 issues of Hao Zhang

Seeing many complaints about the required peak memory being too high. We would like to keep the peak memory of the following command be less than 8GB. ``` python3 -m...

good first issue

- [ ] Support cli inference of Flan-T5 - [ ] Support web UI serving of Flan-T5 - [ ] Support fine-tuning of Flan-T5

good first issue

## Why are these changes needed? We are going to use [xformer](https://github.com/facebookresearch/xformers) instead of flash attention. Xformer is better because: - It supports more GPU architectures than flash attention, including...

We could use the strategy trainer interface and Ray to implement the GeePS training strategy, which would be useful for cases that single-GPU memory does not suffice for training.