LOMO icon indicating copy to clipboard operation
LOMO copied to clipboard

LOMO: LOw-Memory Optimization

Results 35 LOMO issues
Sort by recently updated
recently updated
newest added

你好,请问是否支持量化的模型,比如gptq? 如果可以的话,按照比例计算的话,我有8张24g的显卡的话,用流水线并行,是不是可以lora 175b版本量化模型了? 谢谢~

Hi, I'd like to run a 65B llama with LOMO, what config should I use to run the training on a 8*RTX 3090 machine? It would be very nice if...

Hello friend! First of all, thanks for your amazing work!! If I use another data collator/dataset loader, would I still be able to train using the LOMO trainer class?

It is a classical idea to overlap the backward pass and the optimization step. PyTorch supports this overlapping in DDP and FSDP. For example, here are hooks in DDP https://github.com/pytorch/pytorch/tree/main/torch/distributed/algorithms/ddp_comm_hooks...

Is gradient accumulation still mathematically possible in this scheme? Because if so we could train 65b on a single 3090 in a day and a half

Does it means LOMO is 11 times faster than AdamW? ![image](https://github.com/OpenLMLab/LOMO/assets/10653991/5a8116c3-f9f2-4e8c-b3a9-ef4ace9dff70)

Through comparative experiments, we found that what really reduces GPU memory is "torch.set_default_dtype(torch.float16)" and deepspeed. We used LLaMA-7B to conduct experiments, using { "zero_optimization":{ "stage": 0 }, "gradient_accumulation_steps": 1, "steps_per_print":...

个人感觉全参数FT还是会比LoRA这种Adapter的效果要好的,那为什么LOMO没有火起来呢?个人已经试过2张24GB的显卡用LOMO FT一个7B的BLOOM,感觉整体流程还蛮丝滑的,为什么在各个平台搜不到太多用LOMO的人呢,好奇怪。