DB-GPT-Hub
DB-GPT-Hub copied to clipboard
loara train error - sh scripts/lora/lora.sh
after sh scripts/lora/lora.sh. it seems I load model and dataset successfully but encounter some problems with torch (I am not sure)? May I ask your torch and cuda version?
[INFO] date:2023-08-13 22:11:05
[2023-08-13 22:11:06,375] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lz/anaconda3/envs/dbgpt_hub did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1879,unix/lz-System'), PosixPath('local/lz-System')}
warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('gnome-shell/PyCharm Professional Edition/1898-3-lz-System_TIME9345021')}
warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
WARNING:root:Process rank: 0, device: cuda:0, n_gpu: 1
WARNING:root:distributed training: True, 16-bits training: False
WARNING:root:Training parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
full_finetune=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=8,
gradient_checkpointing=True,
greater_is_better=None,
group_by_length=True,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0002,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=adapterlora/runs/Aug13_22-11-07_lz-System,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=20,
logging_strategy=steps,
lr_scheduler_type=constant,
max_grad_norm=0.3,
max_steps=500,
metric_for_best_model=None,
model_max_length=1024,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_torch,
optim_args=None,
output_dir=adapterlora,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=adapterlora,
sample_generate=False,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=5,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
train_on_source=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.03,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
Loading Model from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00, 5.59s/it]
WARNING:root:Adding LoRA modules...
WARNING:root:Get the get peft model...
WARNING:root:Using gradient checkpointing...
Loading tokenizer from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1714: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(
WARNING:root:Successfully loaded model and tokenizer.
WARNING:root:Adding special tokens for /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B.
Using pad_token, but it is not set yet.
WARNING:root:Creating a supervised dataset and DataCollator...
Loading datasets: ['spider']
================================================================================
DatasetAttr: dataset_name: spider || hf_hub_url: || local_path: sql_finetune_data.json
data_formate: spider || load_from_local: True || multi_turn: False
Lodding dataset from local path: sql_finetune_data.json
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27962.03it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1093.12it/s]
Generating train split: 8659 examples [00:00, 79507.69 examples/s]
The spider using spider dataset format.
By default, We support the spider dataset format.
Applying instruction template: default
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [00:00<00:00, 44940.77 examples/s]
Removing the unused columns, keep only 'input' and 'output'
You have set the max_train_samples: None, will do sampling ...
loaded dataset: spider #train data size: 8659
Concatenated dataset list: ['spider'], #train dataset size: 8659
train_dataset: <class 'dbgpt_hub.data.sft_dataset.SFTInstructionDataset'>, mutlti-turn: False, #length: 8659
Adding data collator: <class 'dbgpt_hub.data.sft_dataset.DataCollatorForSupervisedDataset'>
WARNING:root:Creating a Trainer...
Traceback (most recent call last):
File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 310, in <module>
train()
File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 274, in train
trainer = Seq2SeqTrainer(
File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 56, in __init__
super().__init__(
File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 499, in __init__
self._move_model_to_device(model, args.device)
File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 741, in _move_model_to_device
model = model.to(device)
File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 6 more times]
File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!
finished
after sh scripts/lora/lora.sh. it seems I load model and dataset successfully but encounter some problems with torch (I am not sure)? May I ask your torch and cuda version?
[INFO] date:2023-08-13 22:11:05 [2023-08-13 22:11:06,375] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ bin /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32 /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lz/anaconda3/envs/dbgpt_hub did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1879,unix/lz-System'), PosixPath('local/lz-System')} warn(msg) /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('gnome-shell/PyCharm Professional Edition/1898-3-lz-System_TIME9345021')} warn(msg) /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')} warn(msg) /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')} warn(msg) /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)! warn(msg) CUDA SETUP: Highest compute capability among GPUs detected: 8.9 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so... WARNING:root:Process rank: 0, device: cuda:0, n_gpu: 1 WARNING:root:distributed training: True, 16-bits training: False WARNING:root:Training parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, full_finetune=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=8, gradient_checkpointing=True, greater_is_better=None, group_by_length=True, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=<HUB_TOKEN>, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0002, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=adapterlora/runs/Aug13_22-11-07_lz-System, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=20, logging_strategy=steps, lr_scheduler_type=constant, max_grad_norm=0.3, max_steps=500, metric_for_best_model=None, model_max_length=1024, mp_parameters=, no_cuda=False, num_train_epochs=1.0, optim=adamw_torch, optim_args=None, output_dir=adapterlora, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=4, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=<PUSH_TO_HUB_TOKEN>, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], resume_from_checkpoint=None, run_name=adapterlora, sample_generate=False, save_on_each_node=False, save_safetensors=False, save_steps=500, save_strategy=steps, save_total_limit=5, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, train_on_source=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.03, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) Loading Model from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B... /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. warnings.warn( Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00, 5.59s/it] WARNING:root:Adding LoRA modules... WARNING:root:Get the get peft model... WARNING:root:Using gradient checkpointing... Loading tokenizer from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B... /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1714: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. warnings.warn( WARNING:root:Successfully loaded model and tokenizer. WARNING:root:Adding special tokens for /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B. Using pad_token, but it is not set yet. WARNING:root:Creating a supervised dataset and DataCollator... Loading datasets: ['spider'] ================================================================================ DatasetAttr: dataset_name: spider || hf_hub_url: || local_path: sql_finetune_data.json data_formate: spider || load_from_local: True || multi_turn: False Lodding dataset from local path: sql_finetune_data.json Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27962.03it/s] Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1093.12it/s] Generating train split: 8659 examples [00:00, 79507.69 examples/s] The spider using spider dataset format. By default, We support the spider dataset format. Applying instruction template: default Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [00:00<00:00, 44940.77 examples/s] Removing the unused columns, keep only 'input' and 'output' You have set the max_train_samples: None, will do sampling ... loaded dataset: spider #train data size: 8659 Concatenated dataset list: ['spider'], #train dataset size: 8659 train_dataset: <class 'dbgpt_hub.data.sft_dataset.SFTInstructionDataset'>, mutlti-turn: False, #length: 8659 Adding data collator: <class 'dbgpt_hub.data.sft_dataset.DataCollatorForSupervisedDataset'> WARNING:root:Creating a Trainer... Traceback (most recent call last): File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 310, in <module> train() File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 274, in train trainer = Seq2SeqTrainer( File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 56, in __init__ super().__init__( File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 499, in __init__ self._move_model_to_device(model, args.device) File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 741, in _move_model_to_device model = model.to(device) File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to return self._apply(convert) File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) [Previous line repeated 6 more times] File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply param_applied = fn(param) File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) NotImplementedError: Cannot copy out of meta tensor; no data! finished
sure , I remember the cuda version seems 11.6 ,and torch version is 2.0+ , tomorrow ,I will go to the lab machine to confirm further.
after sh scripts/lora/lora.sh. it seems I load model and dataset successfully but encounter some problems with torch (I am not sure)? May I ask your torch and cuda version?
[INFO] date:2023-08-13 22:11:05 [2023-08-13 22:11:06,375] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ bin /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32 /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lz/anaconda3/envs/dbgpt_hub did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1879,unix/lz-System'), PosixPath('local/lz-System')} warn(msg) /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('gnome-shell/PyCharm Professional Edition/1898-3-lz-System_TIME9345021')} warn(msg) /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')} warn(msg) /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')} warn(msg) /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)! warn(msg) CUDA SETUP: Highest compute capability among GPUs detected: 8.9 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so... WARNING:root:Process rank: 0, device: cuda:0, n_gpu: 1 WARNING:root:distributed training: True, 16-bits training: False WARNING:root:Training parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, full_finetune=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=8, gradient_checkpointing=True, greater_is_better=None, group_by_length=True, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=<HUB_TOKEN>, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0002, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=adapterlora/runs/Aug13_22-11-07_lz-System, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=20, logging_strategy=steps, lr_scheduler_type=constant, max_grad_norm=0.3, max_steps=500, metric_for_best_model=None, model_max_length=1024, mp_parameters=, no_cuda=False, num_train_epochs=1.0, optim=adamw_torch, optim_args=None, output_dir=adapterlora, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=4, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=<PUSH_TO_HUB_TOKEN>, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], resume_from_checkpoint=None, run_name=adapterlora, sample_generate=False, save_on_each_node=False, save_safetensors=False, save_steps=500, save_strategy=steps, save_total_limit=5, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, train_on_source=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.03, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) Loading Model from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B... /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. warnings.warn( Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00, 5.59s/it] WARNING:root:Adding LoRA modules... WARNING:root:Get the get peft model... WARNING:root:Using gradient checkpointing... Loading tokenizer from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B... /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1714: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. warnings.warn( WARNING:root:Successfully loaded model and tokenizer. WARNING:root:Adding special tokens for /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B. Using pad_token, but it is not set yet. WARNING:root:Creating a supervised dataset and DataCollator... Loading datasets: ['spider'] ================================================================================ DatasetAttr: dataset_name: spider || hf_hub_url: || local_path: sql_finetune_data.json data_formate: spider || load_from_local: True || multi_turn: False Lodding dataset from local path: sql_finetune_data.json Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27962.03it/s] Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1093.12it/s] Generating train split: 8659 examples [00:00, 79507.69 examples/s] The spider using spider dataset format. By default, We support the spider dataset format. Applying instruction template: default Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [00:00<00:00, 44940.77 examples/s] Removing the unused columns, keep only 'input' and 'output' You have set the max_train_samples: None, will do sampling ... loaded dataset: spider #train data size: 8659 Concatenated dataset list: ['spider'], #train dataset size: 8659 train_dataset: <class 'dbgpt_hub.data.sft_dataset.SFTInstructionDataset'>, mutlti-turn: False, #length: 8659 Adding data collator: <class 'dbgpt_hub.data.sft_dataset.DataCollatorForSupervisedDataset'> WARNING:root:Creating a Trainer... Traceback (most recent call last): File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 310, in <module> train() File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 274, in train trainer = Seq2SeqTrainer( File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 56, in __init__ super().__init__( File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 499, in __init__ self._move_model_to_device(model, args.device) File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 741, in _move_model_to_device model = model.to(device) File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to return self._apply(convert) File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) [Previous line repeated 6 more times] File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply param_applied = fn(param) File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) NotImplementedError: Cannot copy out of meta tensor; no data! finished
torch and cuda version : 2.0.1+cu117 (11.7) , it works for me .
May I ask, at least what gpu can run torch==2.1.0? I'm working on 2080ti and torch==2.1.0
May I ask, at least what gpu can run torch==2.1.0? I'm working on 2080ti and torch==2.1.0
At least T4, V100, A100, 4090 are ok for 2.1.0