glake icon indicating copy to clipboard operation
glake copied to clipboard

GLake: optimizing GPU memory management and IO transmission.

Results 12 glake issues
Sort by recently updated
recently updated
newest added

Hi, this is a really cool peice of work! I was wondering if this was compatible with pytorch 2.3.0? And if not, if there was any way to make it...

Thanks for your sharing! I'm greatly appreciate your work for reducing the cuda memory fragmentation. Recently I have integrated GMLake into torch2.1.0 and finished compiling without error. I would like...

before pip install torch-1.13.0a0+git49444c3-cp38-cp38-linux_x86_64.whl root@worker-0:/workspace#python Python 3.8.13 packaged by conda-forge (default,Mar 25 2022,06:04:10) [GCC 10.3.0]on linux Type "help","copyright","credits"or "license"for more information. >>import torch >>print(torch.__version__) 1.13.1+cu117 after pip install torch-1.13.0a0+git49444c3-cp38-cp38-linux_x86_64.whl root@worker-0:/workspace#...

Could you please provide the wheel file corresponding to PyTorch 1.13.1?

Very cool work, we really hope to use Glake in our LLM training. However, I failed when trying to compile glake on pytorch release 2.1. My system information and error...

https://github.com/pytorch/pytorch/pull/96995 https://github.com/pytorch/pytorch/blob/95a86ed9ca107329151e0dc172386d50dd3471c6/c10/cuda/CUDACachingAllocator.cpp#L311-L324 > The expandable_segments:True option is used to enable/disable this behavior. We use cuda's low-level memory APIs, which are similar to mmap, to extend the memory segments. These APIs...

# Training - [ ] Multi-stream Memory Reuse: Done, will be released - [ ] Compatible with Expandable Segment - [ ] Memory Pattern Profiling tool - [ ] DoubleOverlapping(for...

vAttention said that: if use 2M pageSize, 128M physical memory can be wasted per-request in the worst-case in Llama-3-8B (TP-1), but if use 64KB, 128M would be only 4M Do...