codellama icon indicating copy to clipboard operation
codellama copied to clipboard

Running CodeLllama-13B on single GPU

Open manoj21192 opened this issue 1 year ago • 10 comments

In the ReadMe file, it is mentioned that to run 13B model, MP value should be 2. I have only 1 GPU, is there a way to run this model on single GPU (I am fine if efficiency is lost, what I care as of now is to run the 13B model)

manoj21192 avatar Sep 01 '23 09:09 manoj21192

accumulation_steps = 2  # Number of batches to accumulate gradients

for batch_index, batch in enumerate(data_loader):
    # Forward pass and compute loss
    loss = model.forward(batch)

    # Scale the loss by the accumulation_steps
    loss = loss / accumulation_steps

    # Backward pass and compute gradients
    loss.backward()

    if (batch_index + 1) % accumulation_steps == 0:
        # Update the model's parameters
        optimizer.step()
        model.zero_grad()

By using gradient accumulation, you can effectively simulate a larger batch size while training on a single GPU.

GaganHonor avatar Sep 02 '23 03:09 GaganHonor

@GaganHonor : Few doubts though I understand the above code

  1. Since I have only 1 GPU, do I need to set accumulation_steps = 1 for 13B model whose MP value=2
  2. Above changes needs to be done in which file ?

manoj21192 avatar Sep 03 '23 03:09 manoj21192

@GaganHonor : Few doubts though I understand the above code

  1. Since I have only 1 GPU, do I need to set accumulation_steps = 1 for 13B model whose MP value=2
  2. Above changes needs to be done in which file ?

Try Test Try ( personal opinion ) and its in config

GaganHonor avatar Sep 04 '23 01:09 GaganHonor

@GaganHonor : Few doubts though I understand the above code

  1. Since I have only 1 GPU, do I need to set accumulation_steps = 1 for 13B model whose MP value=2
  2. Above changes needs to be done in which file ?

Try Test Try ( personal opinion ) and its in config

I am sorry but I am unable to find any .py file which contains above code. I have cloned the repository from github, could you please let me know the name of python file where this code is present? I cant find any config.py file. I know above code must be running somewhere in backend but I am unable to locate this code to make changes to run 13B model on single GPU

manoj21192 avatar Sep 04 '23 08:09 manoj21192

SOURCE : 13B MODEL GENERATED THIS ANSWER FOR YOU 💀 If you are unable to locate the accumulation_steps variable in the codebase, you can try the following steps to find it:

Search for the variable: Use your text editor's search functionality to search for the variable accumulation_steps within the codebase. This will help you locate where it is defined and used.

Check related files: Look for files or modules that are related to model training or optimization. Common names for such files include train.py, model.py, or files that contain functions related to training or optimization.

Look for model training loop: The accumulation_steps variable is typically used within a loop that iterates over the dataset batches for training. Look for a loop that iterates over the data loader or dataset and performs forward pass, loss computation, backward pass, and parameter update steps.

Consult documentation or code comments: If you are working with a codebase that has documentation or code comments, check if there are any references or explanations regarding the usage of accumulation_steps.

Seek assistance from the code author or community: If you are still unable to find the accumulation_steps variable, consider reaching out to the code author or the community associated with the codebase. They may be able to provide specific guidance or point you to the relevant code section.

Remember to adapt the modifications to the appropriate location once you find the accumulation_steps variable in the codebase.

GaganHonor avatar Sep 04 '23 16:09 GaganHonor

https://download2.llamameta.net/*?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoicG1jeTAzeW9qYXYzOHdodGtzYXRjaWhwIiwiUmVzb3VyY2UiOiJodHRwczpcL1wvZG93bmxvYWQyLmxsYW1hbWV0YS5uZXRcLyoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2OTQwNzY5MDB9fX1dfQ__&Signature=c0CaG3Ph%7EGra7rdugunQaTLGh9d2MjpcUsg%7E7gNLeMuz94p%7EYeC4wKHC0nWM-S5SLaXCNP85cGavjI1VDvpCrtdKHhDWifaVJuJYr1XrU1oP1aSlMw0auEfO2ZLxQ2IgIwaKgcrcgwWrUvylJyThEQCUQNVqk5fp466hHj%7EfM%7EG1AbXFrsgh5LNw3m81zkCeloWC7isnSGwqUpSofUrQVFdsPRab55dIsMxTiX9r3gtpRnb9hbN%7E7YHFwI2I4hAg51iFASEqbpQP8p9ckzEaYupO93Ico8CCXS%7EQpxqcF860LxYgAgYL%7EPur8E9Msez0P30bFF8RVttCLL9D7O7wCA__&Key-Pair-Id=K15QRJLYKIFSLZ&Download-Request-ID=208515475329671

geromepamintuan avatar Sep 06 '23 10:09 geromepamintuan

https://download2.llamameta.net/*?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoicG1jeTAzeW9qYXYzOHdodGtzYXRjaWhwIiwiUmVzb3VyY2UiOiJodHRwczpcL1wvZG93bmxvYWQyLmxsYW1hbWV0YS5uZXRcLyoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2OTQwNzY5MDB9fX1dfQ__&Signature=c0CaG3Ph%7EGra7rdugunQaTLGh9d2MjpcUsg%7E7gNLeMuz94p%7EYeC4wKHC0nWM-S5SLaXCNP85cGavjI1VDvpCrtdKHhDWifaVJuJYr1XrU1oP1aSlMw0auEfO2ZLxQ2IgIwaKgcrcgwWrUvylJyThEQCUQNVqk5fp466hHj%7EfM%7EG1AbXFrsgh5LNw3m81zkCeloWC7isnSGwqUpSofUrQVFdsPRab55dIsMxTiX9r3gtpRnb9hbN%7E7YHFwI2I4hAg51iFASEqbpQP8p9ckzEaYupO93Ico8CCXS%7EQpxqcF860LxYgAgYL%7EPur8E9Msez0P30bFF8RVttCLL9D7O7wCA__&Key-Pair-Id=K15QRJLYKIFSLZ&Download-Request-ID=208515475329671

I have already downloaded all the models, didnt understand how that's going to resolve my query

manoj21192 avatar Sep 06 '23 11:09 manoj21192

You may wanna take a look at https://github.com/facebookresearch/codellama/issues/82 for quantization if the usecase is inference only.

If you could run batch_size = 1, then as discussed above https://github.com/facebookresearch/codellama/issues/77#issuecomment-1703664262, the gradient accumulation could help you simulate a large batch size training.

If you could not even run batch_size = 1, then the only way I could think of is to do CPU-offloading (a pretty naive way of pipeline parallelism) for partial model but I presume it requires you to do a lot of heavy-lifting work.

DyeKuu avatar Sep 12 '23 08:09 DyeKuu

You may wanna take a look at #82 for quantization if the usecase is inference only.

If you could run batch_size = 1, then as discussed above #77 (comment), the gradient accumulation could help you simulate a large batch size training.

If you could not even run batch_size = 1, then the only way I could think of is to do CPU-offloading (a pretty naive way of pipeline parallelism) for partial model but I presume it requires you to do a lot of heavy-lifting work.

I am unable to find the file where code for gradient accumulation is written can you tell me the name of file?

manoj21192 avatar Sep 18 '23 06:09 manoj21192

I'm not too sure what you guys/girls are trying to explain with your solutions, but should we not be able to run the CodeLlama-13b-Instruct with a NVIDIA RTX 4090 with 24mb of GPU ram? The model should fit in it's memory. Also GH, your are talking about the config file for PyTorch framework and not the CodeLlama codebase from this git repository. Why would we want to change it in the PyTorch library files, is there not a way to configure this with a missing parameter in one of the Llama.build() function?

From my point of view GH, you seem to know a lot of torch stuff, but have your tried the default example_instructions.py file by running it with the torchrun wrapper? That's the program we are trying to run and modify not the default in the frameworks.

DragonAngel1st avatar Sep 20 '23 20:09 DragonAngel1st