DB-GPT-Hub icon indicating copy to clipboard operation
DB-GPT-Hub copied to clipboard

loara train error - sh scripts/lora/lora.sh

Open AdjugateMatrix opened this issue 1 year ago • 4 comments

after sh scripts/lora/lora.sh. it seems I load model and dataset successfully but encounter some problems with torch (I am not sure)? May I ask your torch and cuda version?

[INFO] date:2023-08-13 22:11:05 
[2023-08-13 22:11:06,375] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lz/anaconda3/envs/dbgpt_hub did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1879,unix/lz-System'), PosixPath('local/lz-System')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('gnome-shell/PyCharm Professional Edition/1898-3-lz-System_TIME9345021')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
WARNING:root:Process rank: 0, device: cuda:0, n_gpu: 1
WARNING:root:distributed training: True, 16-bits training: False
WARNING:root:Training parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
full_finetune=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=8,
gradient_checkpointing=True,
greater_is_better=None,
group_by_length=True,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0002,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=adapterlora/runs/Aug13_22-11-07_lz-System,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=20,
logging_strategy=steps,
lr_scheduler_type=constant,
max_grad_norm=0.3,
max_steps=500,
metric_for_best_model=None,
model_max_length=1024,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_torch,
optim_args=None,
output_dir=adapterlora,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=adapterlora,
sample_generate=False,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=5,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
train_on_source=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.03,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
Loading Model from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00,  5.59s/it]
WARNING:root:Adding LoRA modules...
WARNING:root:Get the get peft model...
WARNING:root:Using gradient checkpointing...
Loading tokenizer from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1714: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
WARNING:root:Successfully loaded model and tokenizer.
WARNING:root:Adding special tokens for /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B.
Using pad_token, but it is not set yet.
WARNING:root:Creating a supervised dataset and DataCollator...
Loading datasets: ['spider']
================================================================================
DatasetAttr: dataset_name: spider || hf_hub_url:  || local_path: sql_finetune_data.json 
data_formate: spider  || load_from_local: True || multi_turn: False
Lodding dataset from local path: sql_finetune_data.json
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27962.03it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1093.12it/s]
Generating train split: 8659 examples [00:00, 79507.69 examples/s]
The spider using spider dataset format.
By default, We support the spider dataset format.
Applying instruction template: default
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [00:00<00:00, 44940.77 examples/s]
Removing the unused columns, keep only 'input' and 'output'
You have set the max_train_samples: None, will do sampling ...
loaded dataset: spider   #train data size: 8659
Concatenated dataset list: ['spider'], #train dataset size: 8659
train_dataset: <class 'dbgpt_hub.data.sft_dataset.SFTInstructionDataset'>, mutlti-turn: False,  #length: 8659
Adding data collator:  <class 'dbgpt_hub.data.sft_dataset.DataCollatorForSupervisedDataset'>
WARNING:root:Creating a Trainer...
Traceback (most recent call last):
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 310, in <module>
    train()
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 274, in train
    trainer = Seq2SeqTrainer(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 56, in __init__
    super().__init__(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 499, in __init__
    self._move_model_to_device(model, args.device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 741, in _move_model_to_device
    model = model.to(device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 6 more times]
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!
finished

AdjugateMatrix avatar Aug 13 '23 14:08 AdjugateMatrix

after sh scripts/lora/lora.sh. it seems I load model and dataset successfully but encounter some problems with torch (I am not sure)? May I ask your torch and cuda version?

[INFO] date:2023-08-13 22:11:05 
[2023-08-13 22:11:06,375] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lz/anaconda3/envs/dbgpt_hub did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1879,unix/lz-System'), PosixPath('local/lz-System')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('gnome-shell/PyCharm Professional Edition/1898-3-lz-System_TIME9345021')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
WARNING:root:Process rank: 0, device: cuda:0, n_gpu: 1
WARNING:root:distributed training: True, 16-bits training: False
WARNING:root:Training parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
full_finetune=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=8,
gradient_checkpointing=True,
greater_is_better=None,
group_by_length=True,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0002,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=adapterlora/runs/Aug13_22-11-07_lz-System,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=20,
logging_strategy=steps,
lr_scheduler_type=constant,
max_grad_norm=0.3,
max_steps=500,
metric_for_best_model=None,
model_max_length=1024,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_torch,
optim_args=None,
output_dir=adapterlora,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=adapterlora,
sample_generate=False,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=5,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
train_on_source=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.03,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
Loading Model from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00,  5.59s/it]
WARNING:root:Adding LoRA modules...
WARNING:root:Get the get peft model...
WARNING:root:Using gradient checkpointing...
Loading tokenizer from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1714: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
WARNING:root:Successfully loaded model and tokenizer.
WARNING:root:Adding special tokens for /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B.
Using pad_token, but it is not set yet.
WARNING:root:Creating a supervised dataset and DataCollator...
Loading datasets: ['spider']
================================================================================
DatasetAttr: dataset_name: spider || hf_hub_url:  || local_path: sql_finetune_data.json 
data_formate: spider  || load_from_local: True || multi_turn: False
Lodding dataset from local path: sql_finetune_data.json
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27962.03it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1093.12it/s]
Generating train split: 8659 examples [00:00, 79507.69 examples/s]
The spider using spider dataset format.
By default, We support the spider dataset format.
Applying instruction template: default
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [00:00<00:00, 44940.77 examples/s]
Removing the unused columns, keep only 'input' and 'output'
You have set the max_train_samples: None, will do sampling ...
loaded dataset: spider   #train data size: 8659
Concatenated dataset list: ['spider'], #train dataset size: 8659
train_dataset: <class 'dbgpt_hub.data.sft_dataset.SFTInstructionDataset'>, mutlti-turn: False,  #length: 8659
Adding data collator:  <class 'dbgpt_hub.data.sft_dataset.DataCollatorForSupervisedDataset'>
WARNING:root:Creating a Trainer...
Traceback (most recent call last):
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 310, in <module>
    train()
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 274, in train
    trainer = Seq2SeqTrainer(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 56, in __init__
    super().__init__(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 499, in __init__
    self._move_model_to_device(model, args.device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 741, in _move_model_to_device
    model = model.to(device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 6 more times]
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!
finished

sure , I remember the cuda version seems 11.6 ,and torch version is 2.0+ , tomorrow ,I will go to the lab machine to confirm further.

wangzaistone avatar Aug 13 '23 17:08 wangzaistone

after sh scripts/lora/lora.sh. it seems I load model and dataset successfully but encounter some problems with torch (I am not sure)? May I ask your torch and cuda version?

[INFO] date:2023-08-13 22:11:05 
[2023-08-13 22:11:06,375] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lz/anaconda3/envs/dbgpt_hub did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1879,unix/lz-System'), PosixPath('local/lz-System')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('gnome-shell/PyCharm Professional Edition/1898-3-lz-System_TIME9345021')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
WARNING:root:Process rank: 0, device: cuda:0, n_gpu: 1
WARNING:root:distributed training: True, 16-bits training: False
WARNING:root:Training parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
full_finetune=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=8,
gradient_checkpointing=True,
greater_is_better=None,
group_by_length=True,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0002,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=adapterlora/runs/Aug13_22-11-07_lz-System,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=20,
logging_strategy=steps,
lr_scheduler_type=constant,
max_grad_norm=0.3,
max_steps=500,
metric_for_best_model=None,
model_max_length=1024,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_torch,
optim_args=None,
output_dir=adapterlora,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=adapterlora,
sample_generate=False,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=5,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
train_on_source=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.03,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
Loading Model from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00,  5.59s/it]
WARNING:root:Adding LoRA modules...
WARNING:root:Get the get peft model...
WARNING:root:Using gradient checkpointing...
Loading tokenizer from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1714: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
WARNING:root:Successfully loaded model and tokenizer.
WARNING:root:Adding special tokens for /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B.
Using pad_token, but it is not set yet.
WARNING:root:Creating a supervised dataset and DataCollator...
Loading datasets: ['spider']
================================================================================
DatasetAttr: dataset_name: spider || hf_hub_url:  || local_path: sql_finetune_data.json 
data_formate: spider  || load_from_local: True || multi_turn: False
Lodding dataset from local path: sql_finetune_data.json
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27962.03it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1093.12it/s]
Generating train split: 8659 examples [00:00, 79507.69 examples/s]
The spider using spider dataset format.
By default, We support the spider dataset format.
Applying instruction template: default
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [00:00<00:00, 44940.77 examples/s]
Removing the unused columns, keep only 'input' and 'output'
You have set the max_train_samples: None, will do sampling ...
loaded dataset: spider   #train data size: 8659
Concatenated dataset list: ['spider'], #train dataset size: 8659
train_dataset: <class 'dbgpt_hub.data.sft_dataset.SFTInstructionDataset'>, mutlti-turn: False,  #length: 8659
Adding data collator:  <class 'dbgpt_hub.data.sft_dataset.DataCollatorForSupervisedDataset'>
WARNING:root:Creating a Trainer...
Traceback (most recent call last):
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 310, in <module>
    train()
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 274, in train
    trainer = Seq2SeqTrainer(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 56, in __init__
    super().__init__(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 499, in __init__
    self._move_model_to_device(model, args.device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 741, in _move_model_to_device
    model = model.to(device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 6 more times]
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!
finished

torch and cuda version : 2.0.1+cu117 (11.7) , it works for me .

wangzaistone avatar Aug 14 '23 01:08 wangzaistone

May I ask, at least what gpu can run torch==2.1.0? I'm working on 2080ti and torch==2.1.0

xiabo0816 avatar Nov 29 '23 11:11 xiabo0816

May I ask, at least what gpu can run torch==2.1.0? I'm working on 2080ti and torch==2.1.0

At least T4, V100, A100, 4090 are ok for 2.1.0

qidanrui avatar Nov 30 '23 00:11 qidanrui