long-context-attention icon indicating copy to clipboard operation
long-context-attention copied to clipboard

GPU Memory Usage

Open guanzhchen opened this issue 1 year ago • 1 comments

Hi, Thanks for your awesome work. In my test on 8xA800, why using USP with ulysses_degree=8 and ring_degree=1 would take more GPU memory than naive Ulysses?

guanzhchen avatar Aug 02 '24 09:08 guanzhchen

All2All needs some tmp buffer for async P2P. could you post the memory difference? It is very small according to my experience.

feifeibear avatar Aug 05 '24 06:08 feifeibear