LMFlow
LMFlow copied to clipboard
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
chat_env) [root@192 LMFlow]# pip install -r requirements.txt Collecting peft@ git+https://github.com/huggingface/peft.git@deff03f2c251534fffd2511fc2d440e84cc54b1b Cloning https://github.com/huggingface/peft.git (to revision deff03f2c251534fffd2511fc2d440e84cc54b1b) to /tmp/pip-install-qp3u6wif/peft_bb110dd6776941069294566614e126c0 Running command git clone --quiet https://github.com/huggingface/peft.git /tmp/pip-install-qp3u6wif/peft_bb110dd6776941069294566614e126c0 fatal: unable to access 'https://github.com/huggingface/peft.git/': TCP...
No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1'
(lmflow) PS C:\Users\Satan> pip install deepspeed==0.8.3 Looking in indexes: https://mirrors.aliyun.com/pypi/simple/ Collecting deepspeed==0.8.3 Using cached https://mirrors.aliyun.com/pypi/packages/0f/c0/9b57e9ec56f6f405726a384b109f8da1267e41feea081850c2fce1735712/deepspeed-0.8.3.tar.gz (765 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not...
AFAIK,there're two methods to continue finetuning based on the previously finetuned model: 1. (base model + lora model + dataset1) -> (base model + finetuned model + dataset2) 2. (base...
[2023-04-21 22:17:06,284] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]} [2023-04-21 22:17:06,284] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0 [2023-04-21 22:17:06,284] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(, {'localhost': [0]}) [2023-04-21 22:17:06,284] [INFO] [launch.py:162:main] dist_world_size=1 [2023-04-21 22:17:06,284]...
**Describe the bug** (gh_lmflow) ub2004@ub2004-B85M-A0:~/llm_dev/LMFlow$ bash ./scripts/run_finetune.sh [2023-04-20 15:03:57,800] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2023-04-20 15:03:57,810] [INFO] [runner.py:550:main] cmd = /usr/bin/python3...
* Restarting with stat [2023-04-20 08:22:35,933] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:10086 (errno: 98 -...
[2023-04-09 13:43:32,793] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 11961 [2023-04-09 13:43:32,907] [ERROR] [launch.py:324:sigkill_handler] ['/home/seali/anaconda3/bin/python', '-u', 'examples/finetune.py', '--local_rank=0', '--model_name_or_path', 'gpt2', '--dataset_path', '/home/seali/LMFlow-main/data/alpaca/train', '--output_dir', '/home/seali/LMFlow-main/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1',...
当我运行./scripts/run_chatbot.sh想要测试模型时,出现了如下信息: ./scripts/run_chatbot.sh: line 3: /home/ubuntu/llama-7b-hf: Is a directory [2023-04-19 16:43:07,239] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. Detected CUDA_VISIBLE_DEVICES=1: setting --include=localhost:1 [2023-04-19 16:43:07,265]...
I can confirm that 29500 is not being used..... Traceback (most recent call last): File "/data1/xxx/chat/LMFlow/service/app.py", line 35, in model = AutoModel.get_model(model_args, tune_strategy='none', ds_config=ds_config, init_method="tcp://localhost:29501") File "/data1/xxx/chat/LMFlow/src/lmflow/models/auto_model.py", line 16, in...
I've been fine-tuning a model with LoRa and now I want to continue that process for another related task, using a different dataset. I noticed that there is a pre-finetuned...