Mayank Mishra

Results 187 comments of Mayank Mishra

> @mayank31398, I think the formatting issues can be fixe by upgrading pre-commit and clang-format i am not seeing any issues with the formatting in the CI. are you suggesting...

@syp1997 can you tell me how you are launching the job?

If you launch via the Makefile, that shouldn't be a problem since I have set it to only 1 worker in Makefile.

This is the script used for launching 176B: https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/tr11-176B-ml.slurm The architecture is not the same since BLOOM uses alibi and GPT uses absolute embeddings.

For Starcoder, 4D parallelism is used Tensor Parallel, Pipeline Parallel, Sequence Parallel, Data Parallel This is the repo used for starcoder and santacoder training: https://github.com/bigcode-project/Megatron-LM

Hey @brian-pieces check this one out: https://github.com/huggingface/peft/pull/400 Trying to get this in

this repo is no longer being maintained @sevenandseven I suggest using vLLM or TGI

Hi, this repo is no longer maintained

Um, I am not sure. Maybe ur process is getting stuck somewhere?