GaLore issues

How many GB memory is required to train the 7b model using DDP mode with galore?

1

in sigle gpu mode,I success run the train by RTX3090.but it took too long。 in ddp mode，we got OOM in LlamaForCausalLM = torch.nn.parallel.DistributedDataParallel( model, device_ids=[local_rank], output_device=local_rank, broadcast_buffers=False, ) .

zhangqijun

can support llava model ?

awzhgw

Dataset loading issue, integration with Colossal-AI

3

Hi, Thanks for the good work. I'm trying to intergrate this into Colossal-AI(https://github.com/hpcaitech/ColossalAI), compatible with Tensor Parallel and ZeRO. However, I had trouble loading the dataset; seems they updated the...

Edenzzzz

Release of Trained Models

Hi, thanks very much for sharing your impressive work! Would it be possible to release the trained model (e.g., using the script below)? It would greatly facilitate reproducibility efforts. Thank...

JLake310

Any plan for the first stable release?

wsp317

Support for Jamba (ai21labs/Jamba-v0.1)

1

[Jamba](https://huggingface.co/ai21labs/Jamba-v0.1) is a very interesting new model and I’d love to add support for galore for finetuning it. It’s an MoE+Transformer+Mamba hybrid so I’m not sure how that would work...

creatorrr

Resume function for optimizer

Hi, thank you for generously open-sourcing your excellent work. During our experiments, we noticed that there doesn't seem to be a resume/reload function for the optimizer regarding `args.continue_from`. Is our...

bokyeong1015

Please add Phi-2 Support

1

Attempting to use Galore to finetune a phi model yields "AttributeError: 'PhiConfig' object has no attribute 'rms_norm_eps'", which, having gotten that error on other LLM things, typically translates to "this...

calebmor460

A few questions regarding the results and methodology.

1

Hi, thanks for releasing this work! it has all been very interesting to read. However, I do have a few questions regarding your results and methodology. 1. For table 4....

roymiles

Reproducing Perplexity evaluation

2

How exactly did you measure Perplexity during pre-training with GaLore? (e.g. when creating Figure 5 in your paper https://arxiv.org/pdf/2403.03507.pdf ). Thanks.

NitzanHod

GaLore
GaLore copied to clipboard

Metadata

How many GB memory is required to train the 7b model using DDP mode with galore?

can support llava model ?

Dataset loading issue, integration with Colossal-AI

Release of Trained Models

Any plan for the first stable release?

Support for Jamba (ai21labs/Jamba-v0.1)

Resume function for optimizer

Please add Phi-2 Support

A few questions regarding the results and methodology.

Reproducing Perplexity evaluation

← Metadata

Owner

Metadata

GaLore GaLore copied to clipboard

Metadata

← Metadata

Owner

Metadata

GaLore
GaLore copied to clipboard