Jiarui Fang(方佳瑞)
Jiarui Fang(方佳瑞)
I run python3 run_autoshard.py, however, encounter the following error. File "/home/lcfjr/miniconda3/envs/dev/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/lcfjr/miniconda3/envs/dev/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/lcfjr/codes/autoshard/autoshard/training.py", line 474, in act...
FWD_FLAG="--fwd_only" NHEADS=8 HEAD_SIZE=128 GROUP_NUM=1 BS=2 ulysses degree=8 - SEQLEN=16384 - SEQLEN=8192 **Conclusion:** Long Context, Less Comm/Computation Ratio. Because computation is O(N^2), while communication is O(N). - Ulysses Degree=8 - Ring...
hello开发者, 我看到OpenPPL更新的新闻: https://mp.weixin.qq.com/s/L35pj8fYakvYnL4LYu6nuw 想测试一下这个项目里flashddecoding的速度。请问怎么能够复现文章中的结果,有没有测试脚本? 另外这个项目里的flashdecoding和flashattn项目里的有什么区别么?
Thanks for open-sourcing FA3, good job! I am wondering about the FP8 feature. **Compatibility**: Are the NVIDIA L40 and A100 GPUs compatible with the Flash Attention 3 FP8 feature? **Performance**:...
We greatly appreciate the open-source efforts of CogVideo, which have been very helpful in advancing the field of video generation. To achieve real-time generation, the DiT model must be deployed...
We are excited to announce that our xDiT project, a DiT parallel inference engine, has recently added support for HunyuanDiT's parallel inference. By leveraging CFG Parallel and PipeFusion Parallel, HunyuanDiT...
Dear Open-Sora-Plan developers, This is Jiarui from [xDiT](https://github.com/xdit-project/xDiT) project. DiT is an inference engine designed for the parallel deployment of DiTs on large scale. xDiT provides a suite of efficient...
Thank you for contributing to LongRecipe, I noticed the recent release of the technical report. We have been following the EasyContext project for a long time, and both LongRecipe and...
### Checklist - [ ] 1. I have searched related issues but cannot get the expected help. - [ ] 2. The bug has not been fixed in the latest...