icybee comments

Results 241 comments of


                                            icybee

Access Token: Not found Auth0 did not issue an access token.

> auth的问题貌似是api更新导致的，可以尝试`pip install chatgptpy --upgrade`更新一下包。下面那个tkinter的问题貌似只会在windows平台以及第一次运行app时出现。待我具体看看是怎么回事。另外Mac貌似会出现杀进程的情况。不知道是只有我的mac这样还是普遍现象。我的mac也有来着，不清楚为什么

Access Token: Not found Auth0 did not issue an access token.

暂时还没遇到过这个错，一会详细看下

Access Token: Not found Auth0 did not issue an access token.

closing

fintune 13B model, train_loss always 0.0

finetuning on 4xV100 GPU ,I'm having the same issue here, train loss stay at 0.0 eval loss stay at nan, right from the start to the end.

fintune 13B model, train_loss always 0.0

> @keelezibel @bupticybee When I finetune 7B model, set lora_target_modules = ['q_proj','k_proj','v_proj','o_proj'] will avoid loss always 0.0. But when I finetune 13B model, set lora_target_modules= ['q_proj','k_proj','v_proj','o_proj'] doesn't work. I wonder...

fintune 13B model, train_loss always 0.0

> @bupticybee I think you should fine tune on the full model, but V100 will appear OOM. Yes, that's kind of the reason I tried to use LORA in the...

fintune 13B model, train_loss always 0.0

> @bupticybee I think you should fine tune on the full model, but V100 will appear OOM. So is it a V100 problem? Could you Correct me if I'm wrong?...

fintune 13B model, train_loss always 0.0

I use the fllowing script: ``` OMP_NUM_THREADS=8 WORLD_SIZE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 \ --master_port=9967 finetune.py \ --base_model '../llama-13b/' \ --data_path 'alpaca_data.json' \ --output_dir './lora-alpaca_1' \ --lora_target_modules ['q_proj','k_proj','v_proj','o_proj'] ``` to run in...

fintune 13B model, train_loss always 0.0

> I am starting another fine tuning cycle with 7B full model and it seems to work. At least the training loss didn’t go to zero. But I have to...

fintune 13B model, train_loss always 0.0

I finally get it working on 4 x V100 , I remove the following line from finetune.py ``` model = prepare_model_for_int8_training(model) ``` set ```load_in_8bit=False``` I do not use torchrun, whcih...