icybee
icybee
> auth的问题貌似是api更新导致的,可以尝试`pip install chatgptpy --upgrade`更新一下包。 下面那个tkinter的问题貌似只会在windows平台以及第一次运行app时出现。待我具体看看是怎么回事。 另外Mac貌似会出现杀进程的情况。不知道是只有我的mac这样还是普遍现象。 我的mac也有来着,不清楚为什么
暂时还没遇到过这个错,一会详细看下
finetuning on 4xV100 GPU ,I'm having the same issue here, train loss stay at 0.0 eval loss stay at nan, right from the start to the end.
> @keelezibel @bupticybee When I finetune 7B model, set lora_target_modules = ['q_proj','k_proj','v_proj','o_proj'] will avoid loss always 0.0. But when I finetune 13B model, set lora_target_modules= ['q_proj','k_proj','v_proj','o_proj'] doesn't work. I wonder...
> @bupticybee I think you should fine tune on the full model, but V100 will appear OOM. Yes, that's kind of the reason I tried to use LORA in the...
> @bupticybee I think you should fine tune on the full model, but V100 will appear OOM. So is it a V100 problem? Could you Correct me if I'm wrong?...
I use the fllowing script: ``` OMP_NUM_THREADS=8 WORLD_SIZE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 \ --master_port=9967 finetune.py \ --base_model '../llama-13b/' \ --data_path 'alpaca_data.json' \ --output_dir './lora-alpaca_1' \ --lora_target_modules ['q_proj','k_proj','v_proj','o_proj'] ``` to run in...
> I am starting another fine tuning cycle with 7B full model and it seems to work. At least the training loss didn’t go to zero. But I have to...
I finally get it working on 4 x V100 , I remove the following line from finetune.py ``` model = prepare_model_for_int8_training(model) ``` set ```load_in_8bit=False``` I do not use torchrun, whcih...