accelerate setting CUDA_VISIBLE_DEVICES will not work when import get_peft_model(only import this method) method before loading model by invoking load_checkpoint_and

System Info

SOFTWARE:
- `Accelerate` version: 0.17.1
- Platform: centos 7
- Python version: 3.9.7
- Numpy version: 1.20.3
- PyTorch version (GPU?): 2.0.0(True)
- `Accelerate` default config:
        Not found
HARDWARE:
GPU: 8 Tesla T4 16GB cards
System RAM: above 500GB

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[X] My own task or dataset (give details below)

Reproduction

import torch from transformers.models.localglm.modeling_chatglm import ChatGLMForConditionalGeneration from transformers.models.localglm.tokenization_chatglm import ChatGLMTokenizer from transformers.models.localglm.configuration_chatglm import ChatGLMConfig from accelerate import init_empty_weights, load_checkpoint_and_dispatch import os from peft import get_peft_model

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"

BASE_DIR = "/path/llama/chatglm-6b/"

CONFIG_PATH = BASE_DIR + "config.json" TOKENIZER_PATH = BASE_DIR + "tokenizer_config.json"

config = ChatGLMConfig.from_pretrained(CONFIG_PATH) tokenizer = ChatGLMTokenizer.from_pretrained(BASE_DIR)

with init_empty_weights(): model = ChatGLMForConditionalGeneration(config)

model.tie_weights()

model_wrapper = load_checkpoint_and_dispatch(model, BASE_DIR, device_map="auto", dtype=torch.float16, no_split_module_classes=["GLMBlock"])

response, history = model_wrapper.chat(tokenizer, "", history=[])

print(response)

Expected behavior

this script only use 4 GPU card,but it uses whole 8 cards.

when i put "from peft import get_peft_model" after the execution of load_checkpoint_and_dispatch ,it works. above script only reproduct this issue, not my fine tuning code.

my fine tuning code intended to use peft and accelerate to fine tune ChatGLM model. it seems bitsandbytes erase the "os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3",but i am not sure

Apr 30 '23 08:04 zhujc000

cc @pacman100

May 01 '23 13:05 sgugger

Hello @zhujc000, you need to set the env variables before importing transformers, peft or accelerate. So, have the first cell like below:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"

The rest of the following imports in next cell followed by code.

May 02 '23 05:05 pacman100

it works, thanks

May 02 '23 10:05 zhujc000

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

May 30 '23 15:05 github-actions[bot]

setting CUDA_VISIBLE_DEVICES will not work when import get_peft_model(only import this method) method before loading model by invoking load_checkpoint_and_dispatch

System Info

Information

Tasks

Reproduction

Expected behavior