tginart comments

Results 25 comments of


                                            tginart

ImportError: /lib64/libm.so.6: version `GLIBC_2.29' not found

Thanks!

NCCL communicator error: Socket Timeout when finetuning 70B model on 2 * (8* A100(80G))

Hi @HamidShojanazeri I am also seeing this issue. I have tried both `export NCCL_ASYNC_ERROR_HANDLING=1` and `export TORCH_NCCL_ASYNC_ERROR_HANDLING=1` but I still get the error: `torch.distributed.DistBackendError: [14] is setting up NCCL communicator...

About the correction step to gamma-imbalanced partition

Thank you for the interest and the question. The proof that uses the gamma correction is Lemma C.3. It's not that the cluster size is guaranteed, but it's that in...

About the correction step to gamma-imbalanced partition

Thank you for bringing this up. After looking back at my notes, I do think that the denominators you've circled are mistakes and should be as you've mentioned. Actually though,...

Not an issue, a question - Peft/LoRa finetuning a possibility?

Ya'll may find this script helpful: ``` import argparse import loralib as lora import transformers from tqdm import tqdm def lora_process(model_name, max_seq_len, attn_impl, r_emb, r): print("Loading model configurations...") config =...

Not an issue, a question - Peft/LoRa finetuning a possibility?

> > except that it's applied in a somewhat nonstandard way in the fwd pass of the transformer module > > @tginart Can you say more about this? https://github.com/mosaicml/llm-foundry/blob/86864e90e0063651177837e831fe48e80618b969/llmfoundry/models/mpt/modeling_mpt.py#LL485C1-L487C1 @samhavens...

tginart

ImportError: /lib64/libm.so.6: version `GLIBC_2.29' not found

NCCL communicator error: Socket Timeout when finetuning 70B model on 2 * (8* A100(80G))

About the correction step to gamma-imbalanced partition

About the correction step to gamma-imbalanced partition

Not an issue, a question - Peft/LoRa finetuning a possibility?

Not an issue, a question - Peft/LoRa finetuning a possibility?

Broken on docker image?

Broken on docker image?

Broken on docker image?

Broken on docker image?