Yi icon indicating copy to clipboard operation
Yi copied to clipboard

sft报错:ValueError: YiForCausalLM does not support Flash Attention 2.0 yet.

Open zhangxiann opened this issue 1 year ago • 8 comments

根据README运行sft脚本:

cd finetune/scripts
bash run_sft_Yi_6b.sh

报错信息

[2024-01-02 10:43:01,920] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 10:43:04,373] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0,1,2,3: setting --include=localhost:0,1,2,3
[2024-01-02 10:43:04,373] [INFO] [runner.py:570:main] cmd = /data/xxxx/conda/miniconda/envs/llm_yi/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None main.py --data_path ../yi_example_dataset/ --model_name_or_path /xxxxYi/Yi-6B --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --max_seq_len 4096 --learning_rate 2e-6 --weight_decay 0. --num_train_epochs 4 --training_debug_steps 20 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --num_warmup_steps 0 --seed 1234 --gradient_checkpointing --zero_stage 2 --deepspeed --offload --output_dir ./finetuned_model
[2024-01-02 10:43:06,184] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 10:43:07,963] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
[2024-01-02 10:43:07,963] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=4, node_rank=0
[2024-01-02 10:43:07,963] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2024-01-02 10:43:07,963] [INFO] [launch.py:163:main] dist_world_size=4
[2024-01-02 10:43:07,963] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
[2024-01-02 10:43:09,740] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 10:43:09,820] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 10:43:09,829] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 10:43:09,869] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-01-02 10:43:11,832] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-02 10:43:11,832] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-01-02 10:43:12,099] [INFO] [comm.py:637:init_distributed] cdb=None
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-01-02 10:43:12,181] [INFO] [comm.py:637:init_distributed] cdb=None
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-01-02 10:43:12,240] [INFO] [comm.py:637:init_distributed] cdb=None
tokenizer path existtokenizer path existtokenizer path exist


tokenizer path exist
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
    model = create_hf_model(
  File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
    model = model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
    return model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3456, in from_pretrained
    config = cls._autoset_attn_implementation(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
    cls._check_and_enable_flash_attn_2(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1382, in _check_and_enable_flash_attn_2
    raise ValueError(
ValueErrorThe model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
: YiForCausalLM does not support Flash Attention 2.0 yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
    model = create_hf_model(
  File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
    model = model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
    return model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3456, in from_pretrained
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    config = cls._autoset_attn_implementation(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
    model = create_hf_model(
  File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
    model = model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
    cls._check_and_enable_flash_attn_2(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1382, in _check_and_enable_flash_attn_2
    return model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3456, in from_pretrained
    raise ValueError(
ValueError: YiForCausalLM does not support Flash Attention 2.0 yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    config = cls._autoset_attn_implementation(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
    model = create_hf_model(
  File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
    model = model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
    cls._check_and_enable_flash_attn_2(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1382, in _check_and_enable_flash_attn_2
    return model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3456, in from_pretrained
    raise ValueError(
ValueError: YiForCausalLM does not support Flash Attention 2.0 yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new
    config = cls._autoset_attn_implementation(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
    cls._check_and_enable_flash_attn_2(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1382, in _check_and_enable_flash_attn_2
    raise ValueError(
ValueError: YiForCausalLM does not support Flash Attention 2.0 yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new
[2024-01-02 10:43:16,976] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 90923
[2024-01-02 10:43:16,991] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 90924
[2024-01-02 10:43:16,991] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 90925
[2024-01-02 10:43:16,998] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 90926
[2024-01-02 10:43:17,006] [ERROR] [launch.py:321:sigkill_handler] ['/data/xxxx/conda/miniconda/envs/llm_yi/bin/python', '-u', 'main.py', '--local_rank=3', '--data_path', '../yi_example_dataset/', '--model_name_or_path', '/xxxxYi/Yi-6B', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', './finetuned_model'] exits with return code = 1

环境

GPU A100 * 4

config.json

{
  "architectures": [
    "YiForCausalLM"
  ],
  "auto_map": {
    "AutoConfig": "configuration_yi.YiConfig",
    "AutoModel": "modeling_yi.YiModel",
    "AutoModelForCausalLM":"modeling_yi.YiForCausalLM"
    },
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 200000,
  "model_type": "Yi",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 4,
  "pad_token_id": 0,
  "rms_norm_eps": 1e-05,
  "rope_theta": 5000000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.34.0",
  "use_cache": true,
  "vocab_size": 64000
}

python 库

accelerate                0.23.0
aiohttp                   3.8.6
aiosignal                 1.3.1
annotated-types           0.6.0
asttokens                 2.4.1
async-timeout             4.0.3
attrs                     23.1.0
beautifulsoup4            4.12.2
certifi                   2023.7.22
charset-normalizer        3.3.0
click                     8.1.7
cmake                     3.27.7
comm                      0.2.0
conda-pack                0.7.1
datasets                  2.14.5
debugpy                   1.8.0
decorator                 5.1.1
deepspeed                 0.12.2
dill                      0.3.7
einops                    0.7.0
exceptiongroup            1.2.0
executing                 2.0.1
filelock                  3.12.4
flash-attn                2.3.3
frozenlist                1.4.0
fsspec                    2023.6.0
hjson                     3.1.0
huggingface-hub           0.20.1
idna                      3.4
ipykernel                 6.28.0
ipython                   8.19.0
jedi                      0.19.1
Jinja2                    3.1.2
jsonschema                4.20.0
jsonschema-specifications 2023.12.1
jupyter_client            8.6.0
jupyter_core              5.5.1
lit                       17.0.2
MarkupSafe                2.1.3
matplotlib-inline         0.1.6
mpmath                    1.3.0
msgpack                   1.0.7
multidict                 6.0.4
multiprocess              0.70.15
nest-asyncio              1.5.8
networkx                  3.1
ninja                     1.11.1.1
numpy                     1.26.0
nvidia-cublas-cu11        11.10.3.66
nvidia-cuda-cupti-cu11    11.7.101
nvidia-cuda-nvrtc-cu11    11.7.99
nvidia-cuda-runtime-cu11  11.7.99
nvidia-cudnn-cu11         8.5.0.96
nvidia-cufft-cu11         10.9.0.58
nvidia-curand-cu11        10.2.10.91
nvidia-cusolver-cu11      11.4.0.1
nvidia-cusparse-cu11      11.7.4.91
nvidia-nccl-cu11          2.14.3
nvidia-nvtx-cu11          11.7.91
packaging                 23.2
pandas                    2.1.1
parso                     0.8.3
pexpect                   4.9.0
pip                       23.2.1
platformdirs              4.1.0
prompt-toolkit            3.0.43
protobuf                  4.25.1
psutil                    5.9.5
ptyprocess                0.7.0
pure-eval                 0.2.2
py-cpuinfo                9.0.0
pyarrow                   13.0.0
pydantic                  2.4.2
pydantic_core             2.10.1
Pygments                  2.17.2
pynvml                    11.5.0
python-dateutil           2.8.2
pytz                      2023.3.post1
PyYAML                    6.0.1
pyzmq                     25.1.2
ray                       2.7.0
referencing               0.32.0
regex                     2023.10.3
requests                  2.31.0
rpds-py                   0.16.2
safetensors               0.4.0
sentencepiece             0.1.99
setuptools                68.0.0
six                       1.16.0
soupsieve                 2.5
stack-data                0.6.3
sympy                     1.12
tokenizers                0.15.0
torch                     2.0.1
tornado                   6.4
tqdm                      4.66.1
traitlets                 5.14.0
transformers              4.36.2
triton                    2.0.0
typing_extensions         4.8.0
tzdata                    2023.3
urllib3                   2.0.6
wcwidth                   0.2.12
wheel                     0.41.2
xxhash                    3.4.1
yarl                      1.9.2

zhangxiann avatar Jan 02 '24 02:01 zhangxiann

经指点,需要下载最新的模型文件

zhangxiann avatar Jan 02 '24 07:01 zhangxiann

下载最新的模型文件后再进行sft,报如下错误

[2024-01-02 15:57:58,042] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:57:59,681] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2024-01-02 15:57:59,681] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
[2024-01-02 15:57:59,681] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2024-01-02 15:57:59,681] [INFO] [launch.py:163:main] dist_world_size=8
[2024-01-02 15:57:59,681] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2024-01-02 15:58:01,434] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,455] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,491] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,501] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,513] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,525] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,542] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-02 15:58:01,548] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-01-02 15:58:04,583] [INFO] [comm.py:637:init_distributed] cdb=None
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-01-02 15:58:05,357] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-02 15:58:05,357] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-02 15:58:05,362] [INFO] [comm.py:637:init_distributed] cdb=None
/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-01-02 15:58:05,393] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-02 15:58:05,393] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-01-02 15:58:05,393] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-02 15:58:05,395] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-02 15:58:05,401] [INFO] [comm.py:637:init_distributed] cdb=None
tokenizer path existtokenizer path existtokenizer path exist

tokenizer path exist

tokenizer path existtokenizer path exist

tokenizer path exist
tokenizer path exist
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
    model = create_hf_model(
  File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
    model = model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    super().__init__(config)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
        config = self._autoset_attn_implementation(model = create_hf_model(

  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
  File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
    model = model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
    cls._check_and_enable_flash_attn_2(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
    The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
    model = cls(config, *model_args, **model_kwargs)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
    super().__init__(config)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    config = self._autoset_attn_implementation(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
    cls._check_and_enable_flash_attn_2(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
    raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
    model = create_hf_model(
  File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
    model = model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
    super().__init__(config)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    config = self._autoset_attn_implementation(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
    cls._check_and_enable_flash_attn_2(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
    raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
    model = create_hf_model(
  File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
    model = model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    return model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
    model = create_hf_model(
  File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
    model = model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
    model = cls(config, *model_args, **model_kwargs)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    super().__init__(config)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
    model = create_hf_model(
  File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
        model = cls(config, *model_args, **model_kwargs)config = self._autoset_attn_implementation(

  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
    model = model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    Traceback (most recent call last):
return model_class.from_pretrained(  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>

  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
    super().__init__(config)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    cls._check_and_enable_flash_attn_2(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
    model = create_hf_model(
  File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
    model = model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    config = self._autoset_attn_implementation(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
    raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
    return model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
    cls._check_and_enable_flash_attn_2(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
    model = cls(config, *model_args, **model_kwargs)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
    raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
    super().__init__(config)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    model = cls(config, *model_args, **model_kwargs)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
    config = self._autoset_attn_implementation(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
    super().__init__(config)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    cls._check_and_enable_flash_attn_2(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
    config = self._autoset_attn_implementation(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
    raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    cls._check_and_enable_flash_attn_2(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 253, in main
    model = create_hf_model(
  File "/data/xxxx/ai_parse/Yi/finetune/utils/model/model_utils.py", line 30, in create_hf_model
    model = model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
    return model_class.from_pretrained(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3462, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1108, in __init__
    super().__init__(config)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in __init__
    config = self._autoset_attn_implementation(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1302, in _autoset_attn_implementation
    cls._check_and_enable_flash_attn_2(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1422, in _check_and_enable_flash_attn_2
    raise ValueError(
ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour.
[2024-01-02 15:58:14,702] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59071
[2024-01-02 15:58:14,719] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59072
[2024-01-02 15:58:14,777] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59073
[2024-01-02 15:58:14,784] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59074
[2024-01-02 15:58:14,790] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59075
[2024-01-02 15:58:14,791] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59076
[2024-01-02 15:58:14,797] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59077
[2024-01-02 15:58:14,803] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 59078
[2024-01-02 15:58:14,810] [ERROR] [launch.py:321:sigkill_handler] ['/data/xxxx/conda/miniconda/envs/llm_yi/bin/python', '-u', 'main.py', '--local_rank=7', '--data_path', '../yi_example_dataset/', '--model_name_or_path', '/xxxx/Yi/Yi-6B', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', './finetuned_model'] exits with return code = 1

zhangxiann avatar Jan 02 '24 08:01 zhangxiann

Similar to this issue. Check the solution provided there might be helpful

markli404 avatar Jan 02 '24 23:01 markli404

Similar to this issue. Check the solution provided there might be helpful

是因为Yi的 sft 代码里少了这两行参数: image

加上之后,再次运行,报新的错如下:

Loading extension module cpu_adam...
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 330, in main
    optimizer = AdamOptimizer(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 452, in load
    return self.jit_load(verbose)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 501, in jit_load
    op_module = load(name=self.name,
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /data/xxxx/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Loading extension module cpu_adam...
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 330, in main
    optimizer = AdamOptimizer(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 452, in load
    return self.jit_load(verbose)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 501, in jit_load
    op_module = load(name=self.name,
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
Loading extension module cpu_adam...
    return _jit_compile(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 330, in main
    optimizer = AdamOptimizer(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 452, in load
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
      File "<frozen importlib._bootstrap_external>", line 1176, in create_module
return self.jit_load(verbose)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 501, in jit_load
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /data/xxxx/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
    op_module = load(name=self.name,
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /data/xxxx/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f9aa027f520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fc984fc7520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fd85d71b520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7efcd27c7520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fb0b7ed7520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7ff36df6b520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f7f1a357520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f834245f520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

zhangxiann avatar Jan 03 '24 11:01 zhangxiann

Similar to this issue. Check the solution provided there might be helpful

是因为Yi的 sft 代码里少了这两行参数: image

加上之后,再次运行,报新的错如下:

Loading extension module cpu_adam...
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 330, in main
    optimizer = AdamOptimizer(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 452, in load
    return self.jit_load(verbose)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 501, in jit_load
    op_module = load(name=self.name,
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /data/xxxx/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Loading extension module cpu_adam...
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 330, in main
    optimizer = AdamOptimizer(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 452, in load
    return self.jit_load(verbose)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 501, in jit_load
    op_module = load(name=self.name,
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
Loading extension module cpu_adam...
    return _jit_compile(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
Traceback (most recent call last):
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 415, in <module>
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    main()
  File "/data/xxxx/ai_parse/Yi/finetune/sft/main.py", line 330, in main
    optimizer = AdamOptimizer(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__
    self.ds_opt_adam = CPUAdamBuilder().load()
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 452, in load
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
      File "<frozen importlib._bootstrap_external>", line 1176, in create_module
return self.jit_load(verbose)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 501, in jit_load
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /data/xxxx/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
    op_module = load(name=self.name,
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /data/xxxx/.cache/torch_extensions/py310_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f9aa027f520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fc984fc7520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fd85d71b520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7efcd27c7520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fb0b7ed7520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7ff36df6b520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f7f1a357520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f834245f520>
Traceback (most recent call last):
  File "/data/xxxx/conda/miniconda/envs/llm_yi/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

I saw a similar issue in the official deepspeed repo, and it is probably due to the cuda-toolkit version. Hope you will find this helpful. microsoft/DeepSpeed#1846

markli404 avatar Jan 04 '24 03:01 markli404

我也遇到了类似的问题,请问你解决了吗 @zhangxiann

Fred199683 avatar Jan 17 '24 08:01 Fred199683

我也遇到了类似的问题,请问你解决了吗 @zhangxiann

没有解决,我先调研别的模型去了

zhangxiann avatar Jan 19 '24 08:01 zhangxiann

不要用flash-attn2.0或以上,安装flash-attn==1.0.4

Minokun avatar Feb 02 '24 07:02 Minokun