CodeGen icon indicating copy to clipboard operation
CodeGen copied to clipboard

What H/W do you need to to fine tune Codegen?

Open smith-co opened this issue 1 year ago • 6 comments

I would like to fine tune the Codegen model.

What H/W would you need to fine tune a Codegen model?

What are the GPU reuirements?

smith-co avatar Nov 17 '22 18:11 smith-co

Not a comprehensive answer, but I’ll share my experience.

I fine tuned the 350M model on a single A100 with 40Gb of RAM, with batch size 10 and an input length of 512 tokens

Used 80-90% of the RAM

alecsharpie avatar Nov 27 '22 10:11 alecsharpie

@alecsharpie thanks for the sharing, I would like to do the same on a new programmatic language, but I have difficulties to use jaxformer implementation, if you have some examples to share it will be welcome! Did you use deepspeed library?

Extremys avatar Nov 27 '22 10:11 Extremys

@alecsharpie thanks for sharing.

Wondering anyone attempted to fine-tune the 16B model and what kind of resources was employed?

nashid avatar Nov 28 '22 03:11 nashid

@alecsharpie were you able to generate any proper code by giving plain english prompt ? if yes how are you doing that ? I am running the code on kaggle but it seems it's not doing anything at all

SubhajitC-Hexaware avatar Nov 29 '22 07:11 SubhajitC-Hexaware

@SubhajitC-Hexaware very inconsistently with the 350M model, even code based on code prompts isn't consistent for me at this number of parameters

alecsharpie avatar Jan 11 '23 11:01 alecsharpie

@Extremys I used huggingface

alecsharpie avatar Jan 11 '23 11:01 alecsharpie